AI Video Generator Runway Trained On Thousands of YouTube Videos Without Permission (404media.co) 81
samleecole writes: A leaked document obtained by 404 Media shows company-wide effort at generative AI company Runway, where employees collected thousands of YouTube videos and pirated content for training data for its Gen-3 Alpha model. The model -- initially codenamed Jupiter and released officially as Gen-3 -- drew widespread praise from the AI development community and technology outlets covering its launch when Runway released it in June. Last year, Runway raised $141 million from investors including Google and Nvidia, at a $1.5 billion valuation.
The spreadsheet of training data viewed by 404 Media and our testing of the model indicates that part of its training data is popular content from the YouTube channels of thousands of media and entertainment companies, including The New Yorker, VICE News, Pixar, Disney, Netflix, Sony, and many others. It also includes links to channels and individual videos belonging to popular influencers and content creators, including Casey Neistat, Sam Kolder, Benjamin Hardman, Marques Brownlee, and numerous others.
The spreadsheet of training data viewed by 404 Media and our testing of the model indicates that part of its training data is popular content from the YouTube channels of thousands of media and entertainment companies, including The New Yorker, VICE News, Pixar, Disney, Netflix, Sony, and many others. It also includes links to channels and individual videos belonging to popular influencers and content creators, including Casey Neistat, Sam Kolder, Benjamin Hardman, Marques Brownlee, and numerous others.
Did they actually keep any copies? (Score:5, Insightful)
So did they actually 'keep' any copies of the videos, or just 'watch' them for training (i.e. the exact same way humans do when learning from something on youtube).
Because, if their AI is just 'watching' them the same way humans do, then it is not 'piracy', but just the same things normal people do (watching youtube to gain skills, learn about topics...etc.). How is it different really?
Re: (Score:2, Insightful)
It's not, but there's a real desperation from the buggy drivers to pretend that automobiles break some rules that don't exist, so their adoption can be stopped.
Re: (Score:2)
But don't humans do the exact same thing when they 'remember' what is in a video?
The way human brains learn and retain something and the way AI's do it is pretty much identical. Why distinguish one from the other? If it is not allowed for AIs, then shouldn't it be banned for humans too?
Re: (Score:2, Informative)
Re: (Score:3)
AI is not the brain, stop comparing them, humans don't do gradient ascent in their head
No, but our laws are written with human intelligence in mind and so therefore we have to compare AI to a human brain to best figure out how to deal with it in the law. In this case the OP is correct - humans retain some of the content they watch and they use that knowledge when necessary. So if the AI does the same (yes the precide details of how the AI remembers is different but the surface details are the same) then we shoudl treat it the same. Where there is a difference is that AIs can sometimes reprod
Re: (Score:2)
No, but our laws are written with human intelligence in mind and so therefore we have to compare AI to a human brain to best figure out how to deal with it in the law. In this case the OP is correct - humans retain some of the content they watch and they use that knowledge when necessary. So if the AI does the same (yes the precide details of how the AI remembers is different but the surface details are the same) then we shoudl treat it the same. Where there is a difference is that AIs can sometimes reproduce copyrighted work almost exactly and that is where they break the law.
I generally agree with the standard you proposed -- laws are written with human intelligence in mind and we have to compare AI to a human brain.
I disagree with the way you've applied your standard here, because in the part I bolded, you bring up the core of your standard - the differences between a bot and a human brain - only to hand-wave it away like it's not important after all.
Let's phrase the question the opposite direction:
Tonight I wish upon a star and my wish is granted. Tomorrow every human being o
Re: Did they actually keep any copies? (Score:4, Informative)
The way human brains learn and retain something and the way AI's do it is pretty much identical.
The two could not possibly be more dissimilar. The way the human brain encodes information is vastly different from making the verbatim copies necessary for machine processing.
People need to stop treating these programs as if they are in any way similar to humans. They are not. They are both totally different.
Re: (Score:3)
The way the human brain encodes information is vastly different from making the verbatim copies necessary for machine processing.
I think you're wrong here. The end result is that neither my brain, nor the AI neural net, have "verbatim" copies of the original video. In order for my brain to process the video, I require a "verbatim" copy be streamed through my computing device and then presented as visible waves of light. Would it be okay if the video was displayed in visible light, and then a video sensor digitized that back into data that the machine processing then learned on? By your definition that would be perfectly fine, and the
Details Different, Overall Function Similar (Score:3)
The way the human brain encodes information is vastly different from making the verbatim copies necessary for machine processing.
Yes, but generally machine learning algorithms don't retain exact, verbatim copies either. When I am training a neural network on physics data that training data is used to adjust the myriad parameters inside the model to help it better classify events. It does not remember all the training data I feed it, instead, like a human brain (although through a vastly different mechanism) it tries to distill the critical features of different events so that it can use that to identify them. It does not keep copies
Re: (Score:2)
The law says people and machines are different. Deal with it.
Re: (Score:1)
Re: (Score:2)
That is a nonsensical argument and you know it. Stop behaving like a petulant child.
Re: (Score:1)
Have to Extrapolate until Law Catches Up (Score:2)
The law says people and machines are different.
Yes it does but only in so far as machines may not keep copies of copyrighted works. Machine learning generally does not do that - the copyrighted work is used to tweak parameters in the machine learnign model and then discarded. As far as I am aware - and I'm definitely not a lawyer - there is no law about that particular way of using data because, until now, it has not been widespread enouigh for anyone to care.
Now that people are making money off using this data for training everyone is wanting some
Re: (Score:2)
Typically, the limitation is on storage and processing, not only on storage. Try again.
The one exception is that the ideas in a work are not protected. But machines cannot have insight, so they cannot identify or extract ideas.
Re: Have to Extrapolate until Law Catches Up (Score:2)
Re: (Score:1)
Re: (Score:3)
on the surface, we do something very similar to machine learning: we read/watch/listen to the data and retain whatever parts we find interesting or useful.
Again, I think you raise really valid points overall, but I disagree with how quickly you gloss over the "on the surface" part. Because that's the rub, isn't it? Humans only ever know ourselves as a set of epiphenomena. We have literally no idea what is actually happening under the surface that makes the surface what it is. AND we have absolutely no control over what happens beneath that surface. We can barely modify - with years of hard work and setbacks - minor aspects of the epi-Selves we do perceive on
We do Process Single "Pixels" (Score:2)
Making an equivalence because yes, "on the surface" both you and an LLM can be said to [read a string of characters and create an abstract representation of the contents] seems like such a meaningless way to frame the discussion.
As you point out though we have no real understanding of the processes that go on beneath the surface of our intelligence. Given that comparing surface features is the only way to frame the discussion.
Yes, if we ignore the vast differences under the surface of cognition
How do you know that there are vast differences when we all agree that we have no idea how under the surface cognition works for humans? Even your example of how humans can't process every single pixel is wrong. While we do not process pixels, our eyes have individual sensors, rods and cones, and studies sho
Re: (Score:2)
As you point out though we have no real understanding of the processes that go on beneath the surface of our intelligence. Given that comparing surface features is the only way to frame the discussion.
I could agree to this statement as being pragmatic about what kinds of effort we can currently make. But I think it's essential to say "comparing surface features while frequently reminding ourselves that we are only speaking about models, not what IS, and working hard to avoid the temptation to get lazy and substitute the map/model for the territory itself."
Yes, if we ignore the vast differences under the surface of cognition
How do you know that there are vast differences when we all agree that we have no idea how under the surface cognition works for humans?
That's a fair question. I'd say it falls in that existing category of "dude, what if like, when you see red, and I see red, we're seeing different thin
Re: (Score:2)
The AI has to retain something of the copyrighted material, which is a copyright violation.
Is this claim based on the concept of "derivative work"?
If an algorithm scans a file and updates a few thousand entries in a matrix from, say, 1.013 to 1.014, is that creating a derivative work?
Are you quite sure that the law is clear on this question?
Re: (Score:1)
Re: Did they actually keep any copies? (Score:2)
"The AI has to retain something of the copyrighted material, which is a copyright violation"
If you were going to make that argument in court, which clause would you do it under, and what precedent would you cite?
Re: Did they actually keep any copies? (Score:2)
I have him blocked now, but before I did he liked several of my tweets.
Recently an impersonator (not labeled parody) followed me, and being a good citizen even of Twitter, I tried to report it. It turns out that users you have blocked don't show up in the list for that.
Clown shoes!
Re:Did they actually keep any copies? (Score:5, Insightful)
"the exact same way humans do when learning from something on youtube"
Humans aren't machines -- legally. (It doesn't matter what you think "technically". And the laws covering machines processing intellectual property are completely different.
You don't need a license to look at software and run it in your head. A machine does. You don't need a license to read book, the 'copy' (ephemeral or otherwise) in your head is legal -- the same copy in a computers memory even if it doesn't get saved needs a license.
"How is it different really?"
Doesn't really matter if its 'different really'. It's completely different, "legally".
Re: (Score:2)
>Doesn't really matter if its 'different really'. It's completely different, "legally".
Then this is cookey, it means there is something wrong with the law and the law needs to be changed. It makes no sense the way it is now if that is the case..
Re: (Score:1)
Actually, there is something wrong with you. But you are not the first (quasi) religious asshole that wants their deranged religion put into law.
Re: Did they actually keep any copies? (Score:2)
actually there's something wrong with you if you think legislation shouldn't change because of advances in technology
Re: (Score:2)
What advances in technology are you referring to? Machines with consciousness and free will? I am pretty sure we do not have those.
Re:Did they actually keep any copies? (Score:5, Insightful)
Sure you can argue, perhaps even rightfully, that there is similar derivative transformative work in your brain too. But it doesn't matter -- that one is legal and excepted from all IP law, and that's not an accident -- society has no interest in restricting that copy.
Whether the law is 'wrong' on this human / machine distinction is really up to what society wants. We know that society wants humans to be able read books and create some internal representation / memory of the contents. That why writers write books, and why people read them. And and all the IP law in between tries to ensure writers can create and get compensated for their work, while ensuring people aren't thrown in jail for knowing the plot of lord of the rings after reading it.
The law codifies treatment of physical copies, in physical form, digital copies, broadcasting rights, derivative works, etc. There is a social agreement on how this all works, and that is then (imperfectly) codified in law.
So the question at hand is what does society at large want machines to be able to do with this IP? Is this a processing that requires licensing or not? Perhaps not, but it would also be completely rational and consistent for society to decide that the machine version does require permission or licensing, while the human remains exempt. Nothing kooky about that.
It all makes perfect sense when you remember that the law is fundamentally not really about 'technical consistency' although that is certainly a laudable secondary objective, it is primarily about codifying the 'will of the people'.
Re: (Score:2)
I agree with you, and I think society wants an open system with freely available information, except for the luddites (or slashitdes) that don't want society to change. The internet was built on the concept of free information and access to all, before the big corporations got involved. People should be compensated for their work, within reason, and it's usually the large corporation pushing for more than necessary, not the creative individual. But what is happening here is going to set a precedent for all
Re: (Score:2)
and I think society wants an open system with freely available information
That sounds great. But I'm not sure society wants a system where AI's read your book and ripoff the ideas and themes and then produce a hundred, or a thousand, or 10 million perfectly legal knockoffs of it 2 seconds after it gets published, in every setting, time period, tech level, fantasy level, rated G to XXX, ensuring nobody needs your original, and you get nothing at all ever.
How do we, as society, prevent that?
Or do we need to... maybe writing, poetry, and music are just hobby pursuits now? Nobody can
Re: (Score:2)
Sure you can argue, perhaps even rightfully, that there is similar derivative transformative work in your brain too. But it doesn't matter -- that one is legal and excepted from all IP law, and that's not an accident -- society has no interest in restricting that copy.
Whether the law is 'wrong' on this human / machine distinction is really up to what society wants. We know that society wants humans to be able read books and create some internal representation / memory of the contents. That why writers write books, and why people read them. And and all the IP law in between tries to ensure writers can create and get compensated for their work, while ensuring people aren't thrown in jail for knowing the plot of lord of the rings after reading it.
The law codifies treatment of physical copies, in physical form, digital copies, broadcasting rights, derivative works, etc. There is a social agreement on how this all works, and that is then (imperfectly) codified in law.
So the question at hand is what does society at large want machines to be able to do with this IP? Is this a processing that requires licensing or not? Perhaps not, but it would also be completely rational and consistent for society to decide that the machine version does require permission or licensing, while the human remains exempt. Nothing kooky about that.
It all makes perfect sense when you remember that the law is fundamentally not really about 'technical consistency' although that is certainly a laudable secondary objective, it is primarily about codifying the 'will of the people'.
Thank you for writing that so well. That is the framework that really matters in all these conversations.
My belief is American discussions tend to display a major weakness when it comes to discussions in this area. The nature of that weakness is simple: our core legal framework and system of governance is based on the notion of personal rights. Each person possesses a set of inalienable rights which are not granted by the government or the democratic process. Instead, these rights attach to a person as a n
Re: (Score:2)
Indeed. The law says machines are not people. Period. As to what humans actually do when they look at things, Science does not know at this time. There is only speculation.
Re: (Score:2)
Watching the way humans do? I don't think its even possible for a machine to do so, as we lack the technology. The closest approximation I think we could get is to point a webcam at a monitor, and I doubt they did even that.
Re: (Score:2)
Time for some major lawsuits from Google. After all, the AI was able to watch the videos quickly and without the mandatory 10 to 45 second advertisement viewing! If Runway can watch the ad-free videos and Google says nothing, then clearly every day users should have access to their methods of bypassing ads.
Re: (Score:2)
So you're claiming if I have AI read some content, it's not piracy (aka, copyright infringement).
In that case, can I have a
Re: Did they actually keep any copies? (Score:2)
Not liking AI is one thing (Score:4, Insightful)
But when is this stupid copyright shit going to stop. If something is freely available on the internet, meaning not locked behind a paywall or authentication of any type, then common sense would say it should be open to be used. There are so many variants of this stupid shit that it isn't even funny. Like Google trying to charge you for capabilities like playing videos in the background on android, but yet you can watch (listen to) the exact same video on a PC with the video minimized. It is idiotic that companies try to claim they have that level of control when they should not. If it is free in any form, it should be free in all forms. And free means you can use it however you see fit as long as you don't RE-distribute it. Yes, AI makes use of the knowledge that it gains from the video, but so does every human that watches it. Copyright laws are stupid as fuck.
Re: (Score:1)
You are confusing freely available with publicly available.
Content is there for public to view. It is not there for random people to take it, modify it and slap own copyrights on it.
Training is illegal? (Score:1)
So is it illegal for me to train myself using youtube videos? Should I get electroshock to have my memory wiped after viewing them?
Re: (Score:2)
No, it is illegal for you to train an AI using youtube videos.
The philosophical equivalence you are trying to draw does not exist under the law. Using online content to train an AI is different than a human watching online content to train themselves, under the law, and that's that. Any argument you make about how these two cases should be considered the same will have no weight under the law.
Be that as it may, the illegality of doing this has not stopped any company from doing this with reckless abandon.
Re: (Score:2)
No, it is illegal for you to train an AI using youtube videos.
The philosophical equivalence you are trying to draw does not exist under the law. Using online content to train an AI is different than a human watching online content to train themselves, under the law, and that's that. Any argument you make about how these two cases should be considered the same will have no weight under the law.
Exactly. The point that "training" an AI and "training" yourself is somehow similar will get you laughed out of court. Incidentally, the point is a purely philosophical one at this time. Science still has no idea how a human mind works. Physicalism is belief, not Science.
Re: (Score:2)
But the fact is they are doing those things, and they will get much better at it over the immediate upcoming years. The notion that physicalism is only belief, not science, in this day and age, is what deserves to be laughed o
Re: (Score:2)
Complete nonsense. AI does not "think" and the law is spot-on. That _you_ are deeply in a quasi-religious delusion is your defect. Incidentally, your "reasoning" is circular, just the same as the theist fuckups like to use.
Re: (Score:2)
Clearly there has to be substantial functional similarity between the two mechanism+structure instances. Many details will differ, but in an emer
Re: (Score:2)
Actually, the hypothesis that two completely dissimilar mechanisms + structures could both result in behaviour as complex and specific as, for example, visual-input object classification and spatial decision making (Tesla FSD "pixels to generalized car movement control" and animal "optic light receptor cell signals to animal movement control") is what is nonsense.
Clearly there has to be substantial functional similarity between the two mechanism+structure instances. Many details will differ, but in an emergent systems functionality sense, there has to be substantial equivalence. Since the extremely complex problem the two things are solving is essentially the same.
Every time someone preaches this line of thought I am amazed at how they never seem to recognize they're making the same basic argument as Biblical Creationists who bring up "irreducible complexity" and "the odds against Earth being in exactly the right solar distance and having a moon and ionosphere etc etc" and "all the things that had to come together just right to produce the amazing human being in this amazing physical environment", as evidence that therefore the whole thing MUST have been Designed by
Re: (Score:2)
Nowhere did I talk about irreducible complexity. Or a creator. Your words not mine.
I simply say that if there is a problem that is incredibly complex AND quite specific, it is most likely that the (also very complex, evolved or designed) solutions to that problem are functionally very similar. This is a notion that is consistent with the well known phenomenon of convergent evolution. Octopus eyes and other animal eyes for example. Functionally equivalent, yet evolved comp
oops triple negative (Score:2)
Re: (Score:2)
I'm missing the crime here (Score:2)
So employees essentially had their AI entity consume freely available streaming videos and learned stuff from them? Seems normal to me. I don't even see the problem with getting a Netflix sub and doing the same thing. If the tools get good enough, cost of production should drop tremendously. The possibilities are both inspiring and terrifying for our species.
What actually is the problem here? (Score:3)
I mean it's fine for humans to freely watch those things.
Why is it a problem if a machine does?
Re: (Score:2)
I would say it's less of a problem as the size and diversity of the training set increases. With too small and uniform a set, you're making a system that produces poor copies of recognizable IP. Once you have a honkin' big set with a lot of variation in it, the specifics that someone might rightly call 'theirs' are sufficiently obscured that I don't think anybody has a right to complain.
The issue here isn't ownership of the original data; it's about stopping computers from developing the capacity to outpe
Re: (Score:2)
Ahh got it, thanks.
I for sure see it as a problem with AI's that "write" code.
My understanding is that all they're really doing is regurgitating human-written code, minus the licence, that it's seen "somewhere", and even it can't tell you where it actually got it from, so you can't even check the licence.
Re: (Score:2)
Because machines are not humans. Maybe stop making stupid claims?
Re: (Score:2)
Where did I claim that machines are humans?
Maybe stop making stupid assumptions?
Re: (Score:2)
You have to disgrace yourself even more? Nice. Well then: Maybe you missed the tiny detail that humans are treated different by the law than inanimate objects.
Re: (Score:2)
You need to stop listening to the voices in your head and just respond to what I actually wrote,
Also you need to find better ways to communicate than just being a total wanker.
Re: (Score:2)
Reverting to cheap rethortic tricks, are we? Well, thanks for admitting defeat.
Re: (Score:2)
Thanks for proving my point.
After AI takes all the fun jobs, I'm it will stop (Score:2)
Re: (Score:2)
Robots won't be doing the drudgery. It appears the shit work will be done by humans who will be grateful to have any job at all. Meanwhile, all the cool & fun jobs like making music, making any type of art (photos, videos, animation) for entertainment, or any type of creative writing will be performed by AI. Get back to work, Slave. How else are you going to pay off all this debt previous generations have racked up !?
A lot of this confusion comes from the fact that our humanities-centered educational system has given people a distorted perception of cognitive tasks. We think of the human mind as being Maslow's pyramid, with engineering and science belonging on the bottom layers because they solve the tangible problems like how to get food, how to build shelter, how to keep yourself safe from attack. Meanwhile, painting, music, poetry are up there at the top as the exalted products of advanced minds which have satisfied
Re: (Score:2)
The Spotify Strategy (Score:5, Interesting)
In the end, the class action didn't go to trial. The company and the folks who had songs in that tricky 10% ended up reaching a deal. Spotify agreed to pay them for all its past copyright infringements and set up a system to pay for streaming royalties going forward...
So if you think about the author's lawsuit from OpenAI's perspective, maybe the lawsuit isn't the worst thing. The company has used all of this copyrighted material, allegedly, hundreds of thousands of books. There is no good way to unfeed all of those books to their AI. But also, it would be a huge pain to track down every single author and work out a licensing deal for those books. So maybe this lawsuit will let them do it all in one fell swoop by negotiating with this handy group of thousands of authors who have collectively sued them.
Here's what they want that permission to look like (Score:2)
"Oh please, Mr. Trillion Dollar Corporation, can I please consume public content on the publicly available Internet? No? Don't worry, I'll make sure the use is ethical. And to prove it, here's a few million a month. Ethical! Thanks."
Yet another money grab (Score:3)
Re: (Score:2)
Consider a network of humanoid robots (Score:2)
And sharing some of these particulars, but mostly sharing the abstracted insights they process from the long sequence of many raw inputs, amongst each other, in a symbolic memory that is partly resident in their physical robot bodies, and partly resident in the
AI vs human learning from data (Score:2)
Isn't this what they all do? (Score:2)
Name one AI company that has *not* used training data without permission!
Re: (Score:3)
The data is scraped and processed by machines because the data is out there on the open worldwide web.
Consider: How did we get instant google search of all the open web information?
Answer: Google used (and uses) software agents ("bots") to web-scrape all the content and indexed it for our searching pleasure.
Did Google ask for permission to do this? No. They just followed the protocol of the day which was to not scrape any website whose web serv
Re: Isn't this what they all do? (Score:2)
Re: (Score:2)
You're going to need a stronger case.
What is the fair use of learning? (Score:2)
This isn't real AI. Still where is the discussion for what the fair use of what is put out there? We learn from watching videos. Where is the fair use of machines learning from reading a copyright textbook or watching a video?