Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Facebook AI

'Torrenting From a Corporate Laptop Doesn't Feel Right': Meta Emails Unsealed (arstechnica.com) 70

An anonymous reader shares a report: Newly unsealed emails allegedly provide the "most damning evidence" yet against Meta in a copyright case raised by book authors alleging that Meta illegally trained its AI models on pirated books.

Last month, Meta admitted to torrenting a controversial large dataset known as LibGen, which includes tens of millions of pirated books. But details around the torrenting were murky until yesterday, when Meta's unredacted emails were made public for the first time. The new evidence showed that Meta torrented "at least 81.7 terabytes of data across multiple shadow libraries through the site Anna's Archive, including at least 35.7 terabytes of data from Z-Library and LibGen," the authors' court filing said. And "Meta also previously torrented 80.6 terabytes of data from LibGen."

"The magnitude of Meta's unlawful torrenting scheme is astonishing," the authors' filing alleged, insisting that "vastly smaller acts of data piracy -- just .008 percent of the amount of copyrighted works Meta pirated -- have resulted in Judges referring the conduct to the US Attorneys' office for criminal investigation."

'Torrenting From a Corporate Laptop Doesn't Feel Right': Meta Emails Unsealed

Comments Filter:
  • jail time (Score:5, Insightful)

    by hjf ( 703092 ) on Friday February 07, 2025 @10:36AM (#65149581) Homepage

    And no one will receive jail time, probation, or anything.

    There will be a fine of probably a fraction of a percent of Meta's daily revenue.

    And a few lawyers will make a ton of money. That's it.

    • I propose the $250,000 fine per infringed work.

      • Perhaps the law is unjust, perhaps unconstitutional. In my moral universe: the information should have been free and should remain free. Freely taken, freely disseminated. The "walled garden" is the evil, not the information, nor its use.

        • So your work should be free ?

          Good to know, come by my house, I have some yard work that needs doing.

          • GP: "the information should have been free and should remain free"
            You: "Yard work"

            Such logic, much wow.

            • The parent actually had a very logical rebuttal to your point (if you make all knowledge freely available, no one can get paid for knowledge work) ... but because they made it in a clever way (using a non-knowledge work example of yard work), it evidently went over your head.

              Meanwhile, your response was on the logical level of "no you're a poopy head".

              • by ceoyoyo ( 59147 )

                if you make all knowledge freely available, no one can get paid for knowledge work

                RIP open source.

              • 1. It was not MY point. For your information, I don't agree with it either.
                2. It is still a logical fallacy, because the rebuttal looks at a different subject, and it's not clever at all, quite the contrary. It's called deflection and it's rather primitive.

          • Is copying the same thing as enslavement?

            Mark Twain planned to extend his books every seven years so people would want to buy his copy.

            I'll always buy an original from the author. Creator's Mark moves this into the Fraud category which is already a crime.

            That men with guns will cage people who threaten your great-grandchildren's rent seeking isn't the moral high ground you think it is.

            In the case of Facebook (whom I loathe) no revenue was lost through their actions.

            These are all separate situations. Confl

            • Re:jail time (Score:4, Insightful)

              by N1AK ( 864906 ) on Friday February 07, 2025 @12:06PM (#65149899) Homepage
              Did Mark Twain have that plan at a point where copies of his works could be made by anyone and distriuted as physical and digital copies within hours of release if it wasn't for copyright protections? If not, it seems a little redundant to claim it provides insight in a completely different context.

              How definitive your claim about Facebook is says a lot about lack of consideratiion you've given the issue before commenting. Facebook are investing billions in this area and are paying very generously for some of the data they use. If they hadn't torrented the works and their options were spend some money or not have the material they would happily have spent a large amount of money for it.

              I'm a long way from happy with copyright law as it stands but arguments against entirely against it need to be a lot more persuasive than those.
              • No, and _still_ 7 years was enough.
              • by unrtst ( 777550 )

                Did Mark Twain have that plan at a point where ...

                Dude, Mark Twain is dead, and some of his works are still under copyright. Do you think he expected that?

                If they hadn't torrented the works and their options were spend some money or not have the material they would happily have spent a large amount of money for it.

                That's some bullshit right there! "they would happily have spent a large amount of money for it" BULLSHIT! If that were true, they would have done that. Even putting that aside, do you have any clue as to how much material is available through LibGen, and how much of that is not available for purchase at all?

                Please note, I'm not claiming that justifies Facebook's actions, but this wasn't theft (it was co

            • by tattood ( 855883 )

              In the case of Facebook (whom I loathe) no revenue was lost through their actions.

              If Facebook had purchased a copy (and perhaps paid a license fee from the authors) then they would have been fine to use the works for their AI. But since they torrented pirated copies, the authors were denied that revenue.

              • by ceoyoyo ( 59147 )

                It does seem very shortsighted of Facebook. I don't expect it would have cost them a significant fraction of what they spent on their LLM program to have just downloaded the public domain archives on the Internet and called up some major publishers for a license for the rest. This is all likely an oversight caused by OMG CATCH UP WITH OPENAI NOW!

        • by whitroth ( 9367 )

          So, all the days and weeks I spent writing my first two now-published (by a small press, not self-published) novels should be unpaid labor? And how do I pay my bills, and how does the small press stay in business?

        • Are you the kind of person who whines about paying for labor? Like the shop rate is $90/hr, but you should pay less because you're special?

    • by Kisai ( 213879 )

      I doubt it.

      Copyright holders have too much power in the US (Second to Japan) and the minute someone who knows their work was in the Z-library/libgen comes forward with a right to sue, every single person who has downloaded LLaMA or any other LLM from Facebook is going to be finding their LLM unusable. And I'm pretty damn sure OpenAI did the same thing.

    • Exactly this. If you're in a technology race with other companies to be the top dog in one of the only (perceived) new frontiers of business, it's way cheaper to ignore a few laws and pay some dinky fine than it is to lose the race and get left behind the competition.

    • This is why in corporate cases like this that I hope someone will take action in the EU.
      Seems like over here is the only place which keeps Big Corp reasonably in check.

      • It's true. The EU is the last glimmer of a chance for humanity to escape slavery to the oligarchy.

    • Re:jail time (Score:5, Interesting)

      by fph il quozientatore ( 971015 ) on Friday February 07, 2025 @11:10AM (#65149709)
      It's sad to think that Aaron Swartz was prosecuted and driven to suicide for basically the same crime, while for Meta it's just another Tuesday.
    • Re:jail time (Score:5, Insightful)

      by bill_mcgonigle ( 4333 ) * on Friday February 07, 2025 @11:26AM (#65149765) Homepage Journal

      Corporate jail time is certainly possible. Their operations can be halted for 90 days or whatever the term is.

      Natural People are 100% vulnerable to jail time yet the Courts conclude that the Corporations have all of the rights of a Person and none of the liabilities (other than garnishing money).

      We can't have /Citizens United/ and immortal immune psychopathic corporations.

    • None of them should face jail time or even fines. AIs work are as transformative as the human mind. Just as you're capable of remembering specific parts of a novel, so is an AI.
      • Torrenting the works is a bit different from just "reading" and "remembering" the information and incorporating it into new creative works. As part of the torrent process they also uploaded and shared the raw files with other torrenters. So even if you give them a pass on how they use the data after having "read" it, torrenting raises the issue of sourcing the raw, copyrighted, pirated files to others. It is the aspect of illegally uploading copyrighted materials through the torrent process that is seen

  • by RitchCraft ( 6454710 ) on Friday February 07, 2025 @10:38AM (#65149585)

    Well let's see how this is handled. If nothing results from this and Zuck isn't personally fined HUGE for this (or even better jailed) then that sends a clear message that piracy is an acceptable form of obtaining digital material. My gut feeling tells me a small fine (slap on the wrist) is coming for poor Zuck. Hoist the sails mate!

    • Well let's see how this is handled

      Wanna take a guess?

      If nothing results from this and Zuck isn't personally fined HUGE for this (or even better jailed)

      He won't be.

      then that sends a clear message that piracy is an acceptable form of obtaining digital material.

      No, it sends the clear message that crime committed by large corps is acceptable. If you try it as the little guy you will be destroyed utterly. Remember what happened with the Sony rootkit?

      But I suspect you already knew that.

      • "Remember what happened with the Sony rootkit?" - yep, that is when Sony products ceased to exist in my world.

    • by Alumoi ( 1321661 )

      Piracy is an acceptable form of obtaining digital material IF you are a big corporation. So, what's new?

  • by Pseudonymous Powers ( 4097097 ) on Friday February 07, 2025 @10:40AM (#65149591)
    The phrasing here, top to bottom, seems to imply that the act of torrenting is illegal in itself, when in fact it's the content of what they torrented that makes the act illegal.
    • Well it's both, it's torrenting (especially the DISTRIBUTING without permission part) of works that have copyright (that's a protected monopol on distributing the works).

      • Well it's both, it's torrenting (especially the DISTRIBUTING without permission part)

        There are plenty of works that have licenses which allow distribution, regardless of the distribution method.

    • by ls671 ( 1122017 )

      The phrasing here, top to bottom, seems to imply that the act of torrenting is illegal in itself, when in fact it's the content of what they torrented that makes the act illegal.

      Note sure downloading copyrighted content from torrent is even illegal. It's distributing it which is I think.

      I know for sure companies scanning torrent traffic for the movie industry don't file any complaint against somebody until he has the complete torrent and is seeding it thus, distributing it.

      • by GrahamJ ( 241784 )

        yeah this. I hate fuckerbook as much as the next sane individual but at least where I'm from downloading isn't illegal. It's easy enough to configure your torrent client to not seed or serve anything to peers.

        What's interesting is unlike, say, normal consumer movie piracy, this content isn't even being consumed in the usual sense. No one read these downloaded books. It's still a legal grey area as to whether training on such content is a copyright violation.

  • Copyright is theft of the commons. AI shall be RobInfoHood.

    In 1787, James Madison submitted a provision to the Framers of the U.S. Constitution to "secure to literary authors their copyrights for a limited time."

    In 1790, U.S. copyright law granted authors a monopoly right over their creations for 14 years, with the option of renewing that monopoly for another 14.

    Article I, section 8: "promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right

  • by itsme1234 ( 199680 ) on Friday February 07, 2025 @10:42AM (#65149605)

    Double digit TBs?! Just relatively recently Google pulled the plug on the unlimited Gsuite plans, where people were having multiple PBs. Yes, PBs, like about probably all the (video) streaming content from all services and all vaguely popular BDs and DVDs ever ripped and put on p2p. Of course, all music one could find and shadow libraries are a rounding error here.

    Just for kicks look for the guy using over one PB on Amazon Cloud Drive (rest in peace) all the way back in 2017.

    • There is a HUGE difference between the number of books per unit of storage space compared to the number of movies per unit of storage space. It is NOT about the number of bits that were torrented, rather it is about the number of individual works that were torrented.

      • Yes, as mentioned the books are just a rounding error TB-wise but anyone can torrent more books than a huge college library.

    • I think they mean a corporation using a torrent that large for infringement purposes. It probably happens all the time but this is the first one to get caught.

      • Yes, THIS. Especially with these AI things, when they are seriously discussing to open nuclear power plants to feed them with electricity, this is one of the most basic and of great quality resource they could tap. That it's illegal to obtain it, and possibly to use it the way they do it ... it's probably a matter of asking for forgiveness instead of permission. That is not even considering the scenario they are high on, that they might get to have something more powerful than all the atomic bombs in the wo

  • Every single major AI model has been trained on pirated data. I would guess that MOST of the data used to train these big AI models violates copyright or privacy laws, somewhere, somehow.

    And now we have court systems in tiny countries ordering the big internet companies to make worldwide changes or face trilllyyuuuuns of dollars of fines.

    Let's just sue and prosecute everyone, for everything.
  • by Vegan Cyclist ( 1650427 ) on Friday February 07, 2025 @10:53AM (#65149653) Homepage

    I just hope that there's a Metallica book in there, and Lars loses his shit like he did last time something from Metallica was pirated.

  • If they made sure that their upload to download ratio was at least 1:1, then they are good... /s

  • by greytree ( 7124971 ) on Friday February 07, 2025 @11:04AM (#65149681)
    Until copyright terms are made a fair 5 years, not the gross 95 years Disney have made them, it is morally right to torrrent copyrighted materials.
    • by timholman ( 71886 ) on Friday February 07, 2025 @11:25AM (#65149755)

      Until copyright terms are made a fair 5 years, not the gross 95 years Disney have made them, it is morally right to torrrent copyrighted materials.

      As an author you could easily be out-lawyered for 5 years by a large company. No, the fair length of copyright is 14 years, with the option of renewing for an additional 14 years, as established by the Copyright Act of 1790.

      That gives the creator ample time to make money from their creation without publishers or Hollywood studios using delaying tactics on authors to wait until the copyright expires, and then using their work without paying a dime.

      Disney's 95 year act (thanks to Sonny Bono) needs to be repealed. But there is at least poetic justice in the fact that Disney's efforts at perpetual copyright have led in no small part to the complete creative bankruptcy of the Star Wars and Marvel franchises under the Disney umbrella. At least we can enjoy the schadenfreude of seeing Disney lose hundreds of millions of dollars each year on movies and TV shows that no one is watching.

  • You're beating up the wrong guy here. Meanwhile do you think Google/Amazon/Apple Book databases were not tapped to train their models?
    Furthermore, here is the real crime: 1% of the well known authors get 99% of the money. That is to say: 99% of those book authors are probably glad the AI might spit out a reference to their obscure work.
  • by butlerm ( 3112 ) on Friday February 07, 2025 @11:15AM (#65149719)

    I have no idea what they have in mind but there are a few things that could be considered fair use here including making an index of the papers in question and calculating secure hashes like SHA-256 for all of them. I would not consider ingesting all of them into some quasi-delusional LLM AI model to be fair use though, but that question has yet to be decided. TLDR copying things even in volume is not necessarily a copyright violation.

    • by mckwant ( 65143 ) on Friday February 07, 2025 @11:44AM (#65149817)

      , at least generally. Four points to it, cite below:

      - Purpose of the use - Commercial v. educational or not for profit
      - Nature of the work used - Technical documentation v., fictional novels
      - Proportion of the work used - Five lines from a sonnet differs from five sentences from LOTR
      - Effect of the use on the commercial marketability of the work - Probably negligible in most cases

      IANAL, which is where these things end up, but Meta's arguments on "Purpose" and "Proportion" isn't readily apparent to me, even assuming they kept careful track of what they were hoovering.
      --
      https://www.copyright.gov/fair... [copyright.gov]

      • by XanC ( 644172 )

        But torrenting is more than just using or making a copy for whatever use you have in mind. Doesn't it involve distributing the work as well?

        • by butlerm ( 3112 )

          That is a good point, so the question is did Meta have good reason to believe that the other people who were participating in the torrents were acquiring the data in a way that was illegal or violated copyright law? I imagine they may have, but the government would probably have to prove that to demonstrate that they were guilty of contributory copyright infringement or some other violation. Copyright holders like to make the case that Internet Service Providers are guilty of contributory copyright infrin

        • YES! Courts have held peer to peer download against defendants because they are helping distribute not merely copying for themselves; furthermore, there was zero profit being made from infringement. Meta is doing way more; but they are a corporation, so the key is to incorporate your whole family and make everybody an employee then get a corporate defense lawyer...when you lose, just bankrupt the corporation; nobody gets hurt.

  • by Anonymous Coward

    Now all you nerds understand what 2A enthusiasts experience when reading the news.

    If you read 81 TB of torrents and thought Those are rookie numbers. You've gotta pump those numbers up [kym-cdn.com], now you know what I think every time I read a news story about someone with an "arsenal" consisting of 5 or 6 guns and a few thousand rounds. Rookie numbers.

  • At best they might get fined $1M which is a tiny drop in the bucket to them. He kissed the ring, so Iâ(TM)m sure he/they have complete immunity
  • Isn't the point of having of having all this digital copyable downloadable, torrentable data available is exactly so that people can copy it, download it, torrent it ? That's one of the brilliant things about digital media.
  • There seems to be an idea that the easier it has become to copy things, the longer copyright durations should be.... how easy it is to pirate is often brought into arguments about copyright terms...

    This is a very strange idea that is for some reason taken for granted?

    Its also easier for the author to make and sell copies. If anything, durations should be lower because of how rapidly the author can distribute.

    How about we go back to 7 years.
  • We got a nasty-gram from Spectrum at a place I worked before because someone was torrenting Ted Lasso on our guest Wi-Fi network that's for contractors in our office building.
  • But it's not necessarily immoral. It's obviously not nice to torrent something made by a small-time individual who relies on people paying for the media to actually make ends meet.
    But I consider it perfectly moral to torrent from the likes of Hollywood studios and big-wig authors who can afford to have me not financially supporting them.
    After all, Hollywood is an industry built off exploiting everyone you can. All the sex scandals, all the allegations, all the drugs, laundering, and trafficking. Do you
  • What I would love to see is for Meta to claim that it was the AI that made them do it. Blame the AI and then claim that because they were just following the AI's orders it is the AI that should be sent to jail. And they didn't read any of the content, only the AI did so it should be the one to be punished.
    And to show just how angry we are with the AI, here, we put it on an SSD, you can take it and put it in jail for the next 2000 years.

The trouble with being punctual is that people think you have nothing more important to do.

Working...