'Torrenting From a Corporate Laptop Doesn't Feel Right': Meta Emails Unsealed (arstechnica.com) 73
An anonymous reader shares a report: Newly unsealed emails allegedly provide the "most damning evidence" yet against Meta in a copyright case raised by book authors alleging that Meta illegally trained its AI models on pirated books.
Last month, Meta admitted to torrenting a controversial large dataset known as LibGen, which includes tens of millions of pirated books. But details around the torrenting were murky until yesterday, when Meta's unredacted emails were made public for the first time. The new evidence showed that Meta torrented "at least 81.7 terabytes of data across multiple shadow libraries through the site Anna's Archive, including at least 35.7 terabytes of data from Z-Library and LibGen," the authors' court filing said. And "Meta also previously torrented 80.6 terabytes of data from LibGen."
"The magnitude of Meta's unlawful torrenting scheme is astonishing," the authors' filing alleged, insisting that "vastly smaller acts of data piracy -- just .008 percent of the amount of copyrighted works Meta pirated -- have resulted in Judges referring the conduct to the US Attorneys' office for criminal investigation."
Last month, Meta admitted to torrenting a controversial large dataset known as LibGen, which includes tens of millions of pirated books. But details around the torrenting were murky until yesterday, when Meta's unredacted emails were made public for the first time. The new evidence showed that Meta torrented "at least 81.7 terabytes of data across multiple shadow libraries through the site Anna's Archive, including at least 35.7 terabytes of data from Z-Library and LibGen," the authors' court filing said. And "Meta also previously torrented 80.6 terabytes of data from LibGen."
"The magnitude of Meta's unlawful torrenting scheme is astonishing," the authors' filing alleged, insisting that "vastly smaller acts of data piracy -- just .008 percent of the amount of copyrighted works Meta pirated -- have resulted in Judges referring the conduct to the US Attorneys' office for criminal investigation."
jail time (Score:5, Insightful)
And no one will receive jail time, probation, or anything.
There will be a fine of probably a fraction of a percent of Meta's daily revenue.
And a few lawyers will make a ton of money. That's it.
Re: (Score:1)
I propose the $250,000 fine per infringed work.
Re: (Score:1)
Perhaps the law is unjust, perhaps unconstitutional. In my moral universe: the information should have been free and should remain free. Freely taken, freely disseminated. The "walled garden" is the evil, not the information, nor its use.
Re: (Score:3)
So your work should be free ?
Good to know, come by my house, I have some yard work that needs doing.
Re: (Score:2)
GP: "the information should have been free and should remain free"
You: "Yard work"
Such logic, much wow.
Re: (Score:3)
The parent actually had a very logical rebuttal to your point (if you make all knowledge freely available, no one can get paid for knowledge work) ... but because they made it in a clever way (using a non-knowledge work example of yard work), it evidently went over your head.
Meanwhile, your response was on the logical level of "no you're a poopy head".
Re: (Score:2)
RIP open source.
Re: (Score:2)
Re: (Score:2)
1. It was not MY point. For your information, I don't agree with it either.
2. It is still a logical fallacy, because the rebuttal looks at a different subject, and it's not clever at all, quite the contrary. It's called deflection and it's rather primitive.
Re: (Score:2)
Is copying the same thing as enslavement?
Mark Twain planned to extend his books every seven years so people would want to buy his copy.
I'll always buy an original from the author. Creator's Mark moves this into the Fraud category which is already a crime.
That men with guns will cage people who threaten your great-grandchildren's rent seeking isn't the moral high ground you think it is.
In the case of Facebook (whom I loathe) no revenue was lost through their actions.
These are all separate situations. Confl
Re:jail time (Score:4, Insightful)
How definitive your claim about Facebook is says a lot about lack of consideratiion you've given the issue before commenting. Facebook are investing billions in this area and are paying very generously for some of the data they use. If they hadn't torrented the works and their options were spend some money or not have the material they would happily have spent a large amount of money for it.
I'm a long way from happy with copyright law as it stands but arguments against entirely against it need to be a lot more persuasive than those.
Re: (Score:2)
Re: (Score:2)
Did Mark Twain have that plan at a point where ...
Dude, Mark Twain is dead, and some of his works are still under copyright. Do you think he expected that?
If they hadn't torrented the works and their options were spend some money or not have the material they would happily have spent a large amount of money for it.
That's some bullshit right there! "they would happily have spent a large amount of money for it" BULLSHIT! If that were true, they would have done that. Even putting that aside, do you have any clue as to how much material is available through LibGen, and how much of that is not available for purchase at all?
Please note, I'm not claiming that justifies Facebook's actions, but this wasn't theft (it was co
Re: (Score:2)
Dude, Mark Twain is dead, and some of his works are still under copyright. Do you think he expected that?
Name one.
From https://www.marktwainproject.o... [marktwainproject.org]
"In 1962, the University of California contracted with the copyright holder for the exclusive right to publish all then-unpublished writings of Mark Twain. Between 1962 and the end of 2002, the University of California's Mark Twain Project published all this unpublished material, either in print editions or on microfilm. Some of the material was first published or reprinted during this period by other publishers, under license from UC Press. Although the copyright on al
Re: (Score:2)
In the case of Facebook (whom I loathe) no revenue was lost through their actions.
If Facebook had purchased a copy (and perhaps paid a license fee from the authors) then they would have been fine to use the works for their AI. But since they torrented pirated copies, the authors were denied that revenue.
Re: (Score:2)
It does seem very shortsighted of Facebook. I don't expect it would have cost them a significant fraction of what they spent on their LLM program to have just downloaded the public domain archives on the Internet and called up some major publishers for a license for the rest. This is all likely an oversight caused by OMG CATCH UP WITH OPENAI NOW!
Re: (Score:2)
So, all the days and weeks I spent writing my first two now-published (by a small press, not self-published) novels should be unpaid labor? And how do I pay my bills, and how does the small press stay in business?
Re: (Score:2)
Are you the kind of person who whines about paying for labor? Like the shop rate is $90/hr, but you should pay less because you're special?
Re: (Score:2)
I doubt it.
Copyright holders have too much power in the US (Second to Japan) and the minute someone who knows their work was in the Z-library/libgen comes forward with a right to sue, every single person who has downloaded LLaMA or any other LLM from Facebook is going to be finding their LLM unusable. And I'm pretty damn sure OpenAI did the same thing.
Re:jail time (Score:4)
> And I'm pretty damn sure OpenAI did the same thing.
Too bad the whistleblower making this claim was *murdered*. I wonder who benefits...
Re: (Score:2)
Exactly this. If you're in a technology race with other companies to be the top dog in one of the only (perceived) new frontiers of business, it's way cheaper to ignore a few laws and pay some dinky fine than it is to lose the race and get left behind the competition.
Re: (Score:2)
This is why in corporate cases like this that I hope someone will take action in the EU.
Seems like over here is the only place which keeps Big Corp reasonably in check.
Re: (Score:2)
It's true. The EU is the last glimmer of a chance for humanity to escape slavery to the oligarchy.
Re:jail time (Score:5, Interesting)
Re: jail time (Score:2)
Quod licet Iovi, non licet bovi.
Re:jail time (Score:5, Insightful)
Corporate jail time is certainly possible. Their operations can be halted for 90 days or whatever the term is.
Natural People are 100% vulnerable to jail time yet the Courts conclude that the Corporations have all of the rights of a Person and none of the liabilities (other than garnishing money).
We can't have /Citizens United/ and immortal immune psychopathic corporations.
Information wants to be free (Score:1)
Re: (Score:3)
Torrenting the works is a bit different from just "reading" and "remembering" the information and incorporating it into new creative works. As part of the torrent process they also uploaded and shared the raw files with other torrenters. So even if you give them a pass on how they use the data after having "read" it, torrenting raises the issue of sourcing the raw, copyrighted, pirated files to others. It is the aspect of illegally uploading copyrighted materials through the torrent process that is seen
Re: (Score:2)
It was true of Uber/Lyft. If you drove an illegal cab in NYC you'd get fined and eventually pulled off the street. If you're Uber, they create new rules to make your behavior no longer illegal. (and then after you've put the existing cabs out of business
Let's see (Score:3)
Well let's see how this is handled. If nothing results from this and Zuck isn't personally fined HUGE for this (or even better jailed) then that sends a clear message that piracy is an acceptable form of obtaining digital material. My gut feeling tells me a small fine (slap on the wrist) is coming for poor Zuck. Hoist the sails mate!
Re: (Score:3)
Well let's see how this is handled
Wanna take a guess?
If nothing results from this and Zuck isn't personally fined HUGE for this (or even better jailed)
He won't be.
then that sends a clear message that piracy is an acceptable form of obtaining digital material.
No, it sends the clear message that crime committed by large corps is acceptable. If you try it as the little guy you will be destroyed utterly. Remember what happened with the Sony rootkit?
But I suspect you already knew that.
Re: (Score:2)
"Remember what happened with the Sony rootkit?" - yep, that is when Sony products ceased to exist in my world.
Re: (Score:2)
Piracy is an acceptable form of obtaining digital material IF you are a big corporation. So, what's new?
Torrenting is not a crime. I think. (Score:3)
Re: (Score:2)
Well it's both, it's torrenting (especially the DISTRIBUTING without permission part) of works that have copyright (that's a protected monopol on distributing the works).
Re: (Score:2)
There are plenty of works that have licenses which allow distribution, regardless of the distribution method.
Re: (Score:2)
The phrasing here, top to bottom, seems to imply that the act of torrenting is illegal in itself, when in fact it's the content of what they torrented that makes the act illegal.
Note sure downloading copyrighted content from torrent is even illegal. It's distributing it which is I think.
I know for sure companies scanning torrent traffic for the movie industry don't file any complaint against somebody until he has the complete torrent and is seeding it thus, distributing it.
Re: (Score:2)
yeah this. I hate fuckerbook as much as the next sane individual but at least where I'm from downloading isn't illegal. It's easy enough to configure your torrent client to not seed or serve anything to peers.
What's interesting is unlike, say, normal consumer movie piracy, this content isn't even being consumed in the usual sense. No one read these downloaded books. It's still a legal grey area as to whether training on such content is a copyright violation.
Information is public: source, knowledge, shared (Score:2)
Copyright is theft of the commons. AI shall be RobInfoHood.
In 1787, James Madison submitted a provision to the Framers of the U.S. Constitution to "secure to literary authors their copyrights for a limited time."
In 1790, U.S. copyright law granted authors a monopoly right over their creations for 14 years, with the option of renewing that monopoly for another 14.
Article I, section 8: "promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right
LOL "The magnitude of Meta's unlawful torrenting" (Score:3)
Double digit TBs?! Just relatively recently Google pulled the plug on the unlimited Gsuite plans, where people were having multiple PBs. Yes, PBs, like about probably all the (video) streaming content from all services and all vaguely popular BDs and DVDs ever ripped and put on p2p. Of course, all music one could find and shadow libraries are a rounding error here.
Just for kicks look for the guy using over one PB on Amazon Cloud Drive (rest in peace) all the way back in 2017.
Re: (Score:3)
There is a HUGE difference between the number of books per unit of storage space compared to the number of movies per unit of storage space. It is NOT about the number of bits that were torrented, rather it is about the number of individual works that were torrented.
Re: (Score:2)
Yes, as mentioned the books are just a rounding error TB-wise but anyone can torrent more books than a huge college library.
Re: (Score:2)
I think they mean a corporation using a torrent that large for infringement purposes. It probably happens all the time but this is the first one to get caught.
Re: (Score:2)
Yes, THIS. Especially with these AI things, when they are seriously discussing to open nuclear power plants to feed them with electricity, this is one of the most basic and of great quality resource they could tap. That it's illegal to obtain it, and possibly to use it the way they do it ... it's probably a matter of asking for forgiveness instead of permission. That is not even considering the scenario they are high on, that they might get to have something more powerful than all the atomic bombs in the wo
eyeroll (Score:2)
And now we have court systems in tiny countries ordering the big internet companies to make worldwide changes or face trilllyyuuuuns of dollars of fines.
Let's just sue and prosecute everyone, for everything.
Metallica (Score:3)
I just hope that there's a Metallica book in there, and Lars loses his shit like he did last time something from Metallica was pirated.
Well, hopefully they didn't leech... (Score:2)
If they made sure that their upload to download ratio was at least 1:1, then they are good... /s
Until copyright is fair, torrent on! (Score:5, Insightful)
Re:Until copyright is fair, torrent on! (Score:4, Insightful)
As an author you could easily be out-lawyered for 5 years by a large company. No, the fair length of copyright is 14 years, with the option of renewing for an additional 14 years, as established by the Copyright Act of 1790.
That gives the creator ample time to make money from their creation without publishers or Hollywood studios using delaying tactics on authors to wait until the copyright expires, and then using their work without paying a dime.
Disney's 95 year act (thanks to Sonny Bono) needs to be repealed. But there is at least poetic justice in the fact that Disney's efforts at perpetual copyright have led in no small part to the complete creative bankruptcy of the Star Wars and Marvel franchises under the Disney umbrella. At least we can enjoy the schadenfreude of seeing Disney lose hundreds of millions of dollars each year on movies and TV shows that no one is watching.
Facebook LLama is OpenSource (Score:2)
Furthermore, here is the real crime: 1% of the well known authors get 99% of the money. That is to say: 99% of those book authors are probably glad the AI might spit out a reference to their obscure work.
Re: (Score:2)
Bull. The former is taking someone's work and not compensating them for it, assuming they are not giving away their own work. The latter is borrowing an item for a limited time and returning it for someone else to use, an item which has been paid for.
If you think everything should be free you go right ahead and do that for your work. The people who make a living off writing/music/movies/etc need to make money or they won't produ
Re: (Score:1)
I can't help you can't comprehend logic. I never said everything should be free. You just want to argue about straw men. Get the fuck outta here with that shit.
Re: (Score:2)
This is silly. Libraries buy their copies and, for digital works, take agreed upon steps to insure that each purchased copy is only ever loaned out to one person at a time.
Have you ever been to a library?
Re: (Score:1)
I grew up in a library and know exactly how it works. It was the only place to get a proper education.
Actually a lot of books are donated by thoughtful authors and publishing companies for the betterment of society.
If you think I am going to advise a company or myself to spend untold amounts of cash on untested software, you are insane.
I will demo the real code myself and then determine if it is worth the investment. Period.
I am not going to buy a car or anything else without trying it out first.
Now if I am
Could possibly be Fair Use (Score:3)
I have no idea what they have in mind but there are a few things that could be considered fair use here including making an index of the papers in question and calculating secure hashes like SHA-256 for all of them. I would not consider ingesting all of them into some quasi-delusional LLM AI model to be fair use though, but that question has yet to be decided. TLDR copying things even in volume is not necessarily a copyright violation.
Don't think Fair Use applies to commercial uses (Score:4, Interesting)
, at least generally. Four points to it, cite below:
- Purpose of the use - Commercial v. educational or not for profit
- Nature of the work used - Technical documentation v., fictional novels
- Proportion of the work used - Five lines from a sonnet differs from five sentences from LOTR
- Effect of the use on the commercial marketability of the work - Probably negligible in most cases
IANAL, which is where these things end up, but Meta's arguments on "Purpose" and "Proportion" isn't readily apparent to me, even assuming they kept careful track of what they were hoovering.
--
https://www.copyright.gov/fair... [copyright.gov]
Re: (Score:3)
But torrenting is more than just using or making a copy for whatever use you have in mind. Doesn't it involve distributing the work as well?
Re: (Score:2)
That is a good point, so the question is did Meta have good reason to believe that the other people who were participating in the torrents were acquiring the data in a way that was illegal or violated copyright law? I imagine they may have, but the government would probably have to prove that to demonstrate that they were guilty of contributory copyright infringement or some other violation. Copyright holders like to make the case that Internet Service Providers are guilty of contributory copyright infrin
Re: (Score:2)
YES! Courts have held peer to peer download against defendants because they are helping distribute not merely copying for themselves; furthermore, there was zero profit being made from infringement. Meta is doing way more; but they are a corporation, so the key is to incorporate your whole family and make everybody an employee then get a corporate defense lawyer...when you lose, just bankrupt the corporation; nobody gets hurt.
Now you understand (Score:1)
Now all you nerds understand what 2A enthusiasts experience when reading the news.
If you read 81 TB of torrents and thought Those are rookie numbers. You've gotta pump those numbers up [kym-cdn.com], now you know what I think every time I read a news story about someone with an "arsenal" consisting of 5 or 6 guns and a few thousand rounds. Rookie numbers.
Nothing to see, he already kissed the ring (Score:1)
Isn't that the point of all this digital data stuf (Score:2)
Strange ideas? (Score:2)
This is a very strange idea that is for some reason taken for granted?
Its also easier for the author to make and sell copies. If anything, durations should be lower because of how rapidly the author can distribute.
How about we go back to 7 years.
Torrenting on guest wifi (Score:2)
Torrenting may be "unlawful" (Score:1)
But I consider it perfectly moral to torrent from the likes of Hollywood studios and big-wig authors who can afford to have me not financially supporting them.
After all, Hollywood is an industry built off exploiting everyone you can. All the sex scandals, all the allegations, all the drugs, laundering, and trafficking. Do you
Send the AI to jail!!!!!! (Score:2)
What I would love to see is for Meta to claim that it was the AI that made them do it. Blame the AI and then claim that because they were just following the AI's orders it is the AI that should be sent to jail. And they didn't read any of the content, only the AI did so it should be the one to be punished.
And to show just how angry we are with the AI, here, we put it on an SSD, you can take it and put it in jail for the next 2000 years.