Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
AI The Internet

iFixit CEO Takes Shots At Anthropic For 'Hitting Our Servers a Million Times In 24 Hours' (pcgamer.com) 48

Yesterday, iFixit CEO Kyle Wiens asked AI company Anthropic why it was clogging up their server bandwidth without permission. "Do you really need to hit our servers a million times in 24 hours?" Wiens wrote on X. "You're not only taking our content without paying, you're tying up our DevOps resources. Not cool." PC Gamer's Jacob Fox reports: Assuming Wiens isn't massively exaggerating, it's no surprise that this is "typing up our devops resources." A million "hits" per day would do it, and would certainly be enough to justify more than a little annoyance. The thing is, putting this bandwidth chugging in context only makes it more ridiculous, which is what Wiens is getting at. It's not just that an AI company is seemingly clogging up server resources, but that it's been expressly forbidden from using the content on its servers anyway.

There should be no reason for an AI company to hit the iFixit site because its terms of service state that "copying or distributing any Content, materials or design elements on the Site for any other purpose, including training a machine learning or AI model, is strictly prohibited without the express prior written permission of iFixit." Unless it wants us to believe it's not going to use any data it scrapes for these purposes, and it's just doing it for... fun?

Well, whatever the case, iFixit's Wiens decided to have some fun with it and ask Anthropic's own AI, Claude, about the matter, saying to Anthropic, "Don't ask me, ask Claude!" It seems that Claude agrees with iFixit, because when it's asked what it should do if it was training a machine learning model and found the above writing in its terms of service, it responded, in no uncertain terms, "Do not use the content." This is, as Wiens points out, something that could be seen if one simply accessed the terms of service.

This discussion has been archived. No new comments can be posted.

iFixit CEO Takes Shots At Anthropic For 'Hitting Our Servers a Million Times In 24 Hours'

Comments Filter:
  • by thegarbz ( 1787294 ) on Thursday July 25, 2024 @05:47PM (#64655996)

    Non-gated terms of service aren't worth anything. This is something which has been tested in court, copyright hasn't but is likely going to be fail as this would be a variant of "we restrict your ability to see and learn" which would fall afoul of literally all exceptions to copyright such as transformative works, satire, etc.

    But terms of service. Unless you can show that the AI agreed to it, it's not likely to be bound by it. You need to gate the content behind an acceptance form for that to stick legally.

    • They make literal copies into the training set. No transformation, no explicit license, no implied license, no copyright exception under the DMCA, not transient copies ... and a piss poor argument for fair use, but poor as it is their only hope.

      • I can copy all the webpages I want so I can go back and learn from them later. This is one of the primary fair uses stipulated by the Berne convention.

      • Member when Slashdot was filled with people who wanted information to be free [wikipedia.org], shat on stupid crony-capitalism copyright rules and hated it when companies put their shit behind paywalls etc.? I 'member.

        I know, I know. Hating Sam Altman et al. is much more satisfying than hating all the paywalling and DRMing content providers. It's just pretty clear that this hatred is clouding a lot of judgment here. A lot of the argumentation here could be straight out of the old playbooks of the RIAA and the MPAA. Somehow

        • by cob666 ( 656740 )

          But in the end, information still wants to be free..

          Sure, raw information might want to be free. However, if I spend my time compiling this 'free' information and then after hours of studying and cross referencing other sources of 'free' information I publish my opinion along with an analysis of this aggregated data on a web site that charges for access to this content, because somehow, I have to be paid for my time, then why should anyone have unfettered access to that content?

          • because somehow, I have to be paid for my time

            "The true solution is going to be something where the archaic business models are no longer deemed essential for survival"

      • They make literal copies into the training set.

        Citation required. They make copies as much as you are making a copy of this post by storing it in your RAM. Data used for training is not copied or used in its original form. Again it hasn't been fully tested yet but I would guarantee you that a trained model is considered transformative since it bears zero resemblance to the original data, despite on occasion being able to spit out an original line.

        I suspect this varies greatly by AI model too. Google for example seems to have trained it's AI to copy and

    • by phantomfive ( 622387 ) on Thursday July 25, 2024 @08:45PM (#64656296) Journal
      The AI isn't scraping the pages from the internet. The company is scraping the pages, putting them in a database, and using the database to train the AI.
      • And if I click File > Save As on this page and copy your post literally nothing you can put on here in copyright or terms of service would make that process illegal for personal use.

        The ability to copy data from one place to another underpins how fundamentally computers would need to function. Saving your post to disk, putting it into a database, or simply having it in my RAM is all the same under current copyright law. The question is what do I do with it. Do I show it to someone else? Give access to so

    • No, because the default position in the absence of a terms of service is that you are not allowed to copy it at all.

      • No, because the default position in the absence of a terms of service is that you are not allowed to copy it at all.

        No. No there is no such thing as a default terms of service. In fact terms of service have zero to do with copying anything what so ever. But let's assume there is. Define copy, and while your at it, provide a legal citation. Did you copy this post by reading it? If not, then how did you read it. It would have to be verbatim in your RAM for your consumption to be translated in some way to display it on your screen.

        • Copyright law says you can't copy without getting permission from the copyright holder. The terms of service is the only thing that gives you that permission. If you don't agree to them, you don't have permission.

          • Copyright has zero to do with terms of service. Zero. NONE. Nothing at all. They are two legally distinct concepts with different purposes. Terms of service may be used to apply a certain copyright to material covered under it, but again the media must be gated to do that and the user must agree to the terms. If you give them access without them agreeing then terms of service do not apply.

            Copyright as well has plenty of carve outs for all manner of required use and there's no such clause in copyright law th

    • by gweihir ( 88907 )

      as this would be a variant of "we restrict your ability to see and learn"

      Bullshit. Machines cannot "see" or "learn". The law and the courts see and recognize that, even if you do not.

      • The law and the courts see and recognize that,

        Citation required. Should be fairly easy for you, court results backing up your case are openly published.

        Also you seem to be strung up on words. Maybe next time read the words which precede them such as "variant of". In the English language that means the following is not meant to be taken as literal in the way it is applied to humans. That was your English lesson for today. Now when you have taken your autism medication substitute the word learn with the word train and try and understand the sentence agai

  • by ctilsie242 ( 4841247 ) on Thursday July 25, 2024 @05:48PM (#64656006)

    I wonder if it is possible for ifixit to change content if it detects an AI site, then feed it poisoned images.

    This is easy to do. I know a small business who had someone else's site linking directly to their pictures, so with some scripting and sending different images depending on the Referral header, normal visitors saw the normal stuff, while the offending site got pornographic images.

    • by luvirini ( 753157 ) on Thursday July 25, 2024 @05:56PM (#64656016)

      >I know a small business who had someone else's site linking directly to their pictures, so with some scripting and sending different images depending on the Referral header, normal visitors saw the normal stuff, while the offending site got pornographic images.

      Funnier would be picture with a banner with something like:
        "We (the linkers) are so clueless that we do not use how to host pistures, so you should not buy from us" or
      "We (the linkers) are asholes that want to use other peoples' images without attribution or permit" and so on

    • Far better than porn is to poison their data. Identify them by IP address, or something, and return text that is vaguely plausible but is nonsense that when fed into their AI will reduce the quality.

      This is similar to a Honeypot [wikipedia.org] that you might use to confuse crackers who are trying to steal your data. I do not know how much effort there would be in creating enough nonsense to damage Anthropic's AI models; they should also stick a suitable robots.txt and a notice saying "Do not enter Anthropic" to protect themselves legally.

      • Far better than porn is to poison their data.

        Won't do anything. What we've already seen is by attempting to filter out porn the only thing AI models have achieved is a poorer reproduction of human anatomy. Companies are in some cases actively filtering it out. This won't poison anything.

        If you want to really poison AI, feed it its own output. There's a story a few above this one that talks about what havoc that causes for the training models.

    • This is easy to do. I know a small business who had someone else's site linking directly to their pictures, so with some scripting and sending different images depending on the Referral header, normal visitors saw the normal stuff, while the offending site got pornographic images.

      Back, early 2000s, when I worked at a university one of the local neo-nazi (as in literal 'heil hitler' skinhead types) cults was trying to doxx the union presidents by directly linking to a photo of him onto their weird nazi bull

    • by cob666 ( 656740 )
      Awesome! I'm sure they could make a change to a routing table and based on source IP address or header send requests to a honey pot of crazy or just random content.
  • It seems that the creation - in this case Claude - has surpassed its nominal creator both intellectually and morally. Of course, that may not imply true intelligence and conscience in Claude, but rather a lack of those attributes in Anthropic.

    Still, when your fucking server farm is a better, more conscientious, and more ethical citizen than you are, maybe it's time for you to eat a bullet - or take an overdose - and shuffle off this mortal coil. Please feel free to do so. If you need help with that endeavou

  • by Rosco P. Coltrane ( 209368 ) on Thursday July 25, 2024 @06:37PM (#64656090)

    Admittedly, it was a long time ago. But back in the early aughts, I used to run servers on the significantly slower intarpipes of the day, and I had scripts to throttle abusive IPs doing mass downloads.

    Don't complain iFixIt, fight back. And if I were you, I'd quietly switch to serving fake pages and incorrect information to AI companies siphoning off my content without my permission to build their businesses on my back.

    • Back in the day a person's IP address identified them uniquely. These days you may inadvertently catch plenty of others in your actions.

  • You're simply defending yourself from an attempted DoS attack.

  • by Arrogant-Bastard ( 141720 ) on Thursday July 25, 2024 @06:41PM (#64656098)
    A lot of people running a lot of operations are seeing similar behavior from a lot of AI companies. Not only are they ignoring robots.txt, not only are they ignoring the terms-of-service, not only are they ignoring copyright, they're using giant server farms (including those of irresponsible companies like Amazon and Google) to absolutely hammer web sites AND to hammer them repeatedly.

    I've been in touch with librarians, archivists, and others who have been forced to deal with the fallout from this. And they're not happy. Which means that a backlash is coming. Some of that will take the form of firewall rules, some of it will take the form of litigation, and some of it will take the form of greatly increased restrictions on access to content.

    It's that last one that's the tragedy for the open Internet. Once again, as we've seen with email, as we've seen with Usenet, as we've seen one thing after another, the boundless, insatiable, insane greed of people who are willing to destroy a thing IF they can make money of it before it disappears determines the outcome. I expect that it'll turn out the same way here, because the billions of dollars being poured into AI companies is sufficient to hire a lot of good attorneys and they'll simply delay the outcome until it doesn't matter any more: the damage will be done.
    • by Lehk228 ( 705449 )
      start feeding them mangled nonsense, once their servers are detected send them b-reel bloopers and other nonsense instead of the stated video
    • Not only are they ignoring robots.txt, not only are they ignoring the terms-of-service, not only are they ignoring copyright

      While it's a shitty thing to do it's worth noting that terms-of-service can't legally bind to open content accessible without agreeing to the terms of service. There's a reason terms of service gate the content in order to be valid. That has been tested in court.

      As for copyright, using something for training data is as yet untested but also unlikely to fall under actual copyright. What may be argued about copyright is the *output*, but you can't accuse me of copying Leonardo Davinci simply because I looked

  • Certainly, they should respect terms of service - although, hopefully robots.txt is being use too.

    That said ... you have a massively popular internet site. Surely you use a CDN, firewall, various caching layers?

  • by thesjaakspoiler ( 4782965 ) on Thursday July 25, 2024 @07:01PM (#64656134)

    Now that would be fun.

  • Just gather the IP addresses they are scrapping from and block them in your firewall. Problem solved.

To communicate is the beginning of understanding. -- AT&T

Working...