Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI Technology

AI Video Generator Runway Trained On Thousands of YouTube Videos Without Permission (404media.co) 81

samleecole writes: A leaked document obtained by 404 Media shows company-wide effort at generative AI company Runway, where employees collected thousands of YouTube videos and pirated content for training data for its Gen-3 Alpha model. The model -- initially codenamed Jupiter and released officially as Gen-3 -- drew widespread praise from the AI development community and technology outlets covering its launch when Runway released it in June. Last year, Runway raised $141 million from investors including Google and Nvidia, at a $1.5 billion valuation.

The spreadsheet of training data viewed by 404 Media and our testing of the model indicates that part of its training data is popular content from the YouTube channels of thousands of media and entertainment companies, including The New Yorker, VICE News, Pixar, Disney, Netflix, Sony, and many others. It also includes links to channels and individual videos belonging to popular influencers and content creators, including Casey Neistat, Sam Kolder, Benjamin Hardman, Marques Brownlee, and numerous others.

This discussion has been archived. No new comments can be posted.

AI Video Generator Runway Trained On Thousands of YouTube Videos Without Permission

Comments Filter:
  • by takochan ( 470955 ) on Thursday July 25, 2024 @10:33AM (#64654660)

    So did they actually 'keep' any copies of the videos, or just 'watch' them for training (i.e. the exact same way humans do when learning from something on youtube).

    Because, if their AI is just 'watching' them the same way humans do, then it is not 'piracy', but just the same things normal people do (watching youtube to gain skills, learn about topics...etc.). How is it different really?

    • Re: (Score:2, Insightful)

      by Luckyo ( 1726890 )

      It's not, but there's a real desperation from the buggy drivers to pretend that automobiles break some rules that don't exist, so their adoption can be stopped.

    • by vux984 ( 928602 ) on Thursday July 25, 2024 @10:50AM (#64654718)

      "the exact same way humans do when learning from something on youtube"

      Humans aren't machines -- legally. (It doesn't matter what you think "technically". And the laws covering machines processing intellectual property are completely different.

      You don't need a license to look at software and run it in your head. A machine does. You don't need a license to read book, the 'copy' (ephemeral or otherwise) in your head is legal -- the same copy in a computers memory even if it doesn't get saved needs a license.

      "How is it different really?"

      Doesn't really matter if its 'different really'. It's completely different, "legally".

      • >Doesn't really matter if its 'different really'. It's completely different, "legally".

        Then this is cookey, it means there is something wrong with the law and the law needs to be changed. It makes no sense the way it is now if that is the case..

        • by gweihir ( 88907 )

          Actually, there is something wrong with you. But you are not the first (quasi) religious asshole that wants their deranged religion put into law.

        • by vux984 ( 928602 ) on Thursday July 25, 2024 @11:56AM (#64654944)

          Sure you can argue, perhaps even rightfully, that there is similar derivative transformative work in your brain too. But it doesn't matter -- that one is legal and excepted from all IP law, and that's not an accident -- society has no interest in restricting that copy.

          Whether the law is 'wrong' on this human / machine distinction is really up to what society wants. We know that society wants humans to be able read books and create some internal representation / memory of the contents. That why writers write books, and why people read them. And and all the IP law in between tries to ensure writers can create and get compensated for their work, while ensuring people aren't thrown in jail for knowing the plot of lord of the rings after reading it.

          The law codifies treatment of physical copies, in physical form, digital copies, broadcasting rights, derivative works, etc. There is a social agreement on how this all works, and that is then (imperfectly) codified in law.

          So the question at hand is what does society at large want machines to be able to do with this IP? Is this a processing that requires licensing or not? Perhaps not, but it would also be completely rational and consistent for society to decide that the machine version does require permission or licensing, while the human remains exempt. Nothing kooky about that.

          It all makes perfect sense when you remember that the law is fundamentally not really about 'technical consistency' although that is certainly a laudable secondary objective, it is primarily about codifying the 'will of the people'.

          • by neoRUR ( 674398 )

            I agree with you, and I think society wants an open system with freely available information, except for the luddites (or slashitdes) that don't want society to change. The internet was built on the concept of free information and access to all, before the big corporations got involved. People should be compensated for their work, within reason, and it's usually the large corporation pushing for more than necessary, not the creative individual. But what is happening here is going to set a precedent for all

            • by vux984 ( 928602 )

              and I think society wants an open system with freely available information

              That sounds great. But I'm not sure society wants a system where AI's read your book and ripoff the ideas and themes and then produce a hundred, or a thousand, or 10 million perfectly legal knockoffs of it 2 seconds after it gets published, in every setting, time period, tech level, fantasy level, rated G to XXX, ensuring nobody needs your original, and you get nothing at all ever.

              How do we, as society, prevent that?

              Or do we need to... maybe writing, poetry, and music are just hobby pursuits now? Nobody can

          • Sure you can argue, perhaps even rightfully, that there is similar derivative transformative work in your brain too. But it doesn't matter -- that one is legal and excepted from all IP law, and that's not an accident -- society has no interest in restricting that copy.

            Whether the law is 'wrong' on this human / machine distinction is really up to what society wants. We know that society wants humans to be able read books and create some internal representation / memory of the contents. That why writers write books, and why people read them. And and all the IP law in between tries to ensure writers can create and get compensated for their work, while ensuring people aren't thrown in jail for knowing the plot of lord of the rings after reading it.

            The law codifies treatment of physical copies, in physical form, digital copies, broadcasting rights, derivative works, etc. There is a social agreement on how this all works, and that is then (imperfectly) codified in law.

            So the question at hand is what does society at large want machines to be able to do with this IP? Is this a processing that requires licensing or not? Perhaps not, but it would also be completely rational and consistent for society to decide that the machine version does require permission or licensing, while the human remains exempt. Nothing kooky about that.

            It all makes perfect sense when you remember that the law is fundamentally not really about 'technical consistency' although that is certainly a laudable secondary objective, it is primarily about codifying the 'will of the people'.

            Thank you for writing that so well. That is the framework that really matters in all these conversations.

            My belief is American discussions tend to display a major weakness when it comes to discussions in this area. The nature of that weakness is simple: our core legal framework and system of governance is based on the notion of personal rights. Each person possesses a set of inalienable rights which are not granted by the government or the democratic process. Instead, these rights attach to a person as a n

      • by gweihir ( 88907 )

        Indeed. The law says machines are not people. Period. As to what humans actually do when they look at things, Science does not know at this time. There is only speculation.

    • Watching the way humans do? I don't think its even possible for a machine to do so, as we lack the technology. The closest approximation I think we could get is to point a webcam at a monitor, and I doubt they did even that.

    • Time for some major lawsuits from Google. After all, the AI was able to watch the videos quickly and without the mandatory 10 to 45 second advertisement viewing! If Runway can watch the ad-free videos and Google says nothing, then clearly every day users should have access to their methods of bypassing ads.

    • by tlhIngan ( 30335 )

      So did they actually 'keep' any copies of the videos, or just 'watch' them for training (i.e. the exact same way humans do when learning from something on youtube).

      Because, if their AI is just 'watching' them the same way humans do, then it is not 'piracy', but just the same things normal people do (watching youtube to gain skills, learn about topics...etc.). How is it different really?

      So you're claiming if I have AI read some content, it's not piracy (aka, copyright infringement).

      In that case, can I have a

    • They downloaded the videos, put them in a database, and used them to "train" the AI.
  • by rtkluttz ( 244325 ) on Thursday July 25, 2024 @10:34AM (#64654664) Homepage

    But when is this stupid copyright shit going to stop. If something is freely available on the internet, meaning not locked behind a paywall or authentication of any type, then common sense would say it should be open to be used. There are so many variants of this stupid shit that it isn't even funny. Like Google trying to charge you for capabilities like playing videos in the background on android, but yet you can watch (listen to) the exact same video on a PC with the video minimized. It is idiotic that companies try to claim they have that level of control when they should not. If it is free in any form, it should be free in all forms. And free means you can use it however you see fit as long as you don't RE-distribute it. Yes, AI makes use of the knowledge that it gains from the video, but so does every human that watches it. Copyright laws are stupid as fuck.

    • by tokul ( 682258 )

      You are confusing freely available with publicly available.

      Content is there for public to view. It is not there for random people to take it, modify it and slap own copyrights on it.

  • So is it illegal for me to train myself using youtube videos? Should I get electroshock to have my memory wiped after viewing them?

    • No, it is illegal for you to train an AI using youtube videos.

      The philosophical equivalence you are trying to draw does not exist under the law. Using online content to train an AI is different than a human watching online content to train themselves, under the law, and that's that. Any argument you make about how these two cases should be considered the same will have no weight under the law.

      Be that as it may, the illegality of doing this has not stopped any company from doing this with reckless abandon.

      • by gweihir ( 88907 )

        No, it is illegal for you to train an AI using youtube videos.

        The philosophical equivalence you are trying to draw does not exist under the law. Using online content to train an AI is different than a human watching online content to train themselves, under the law, and that's that. Any argument you make about how these two cases should be considered the same will have no weight under the law.

        Exactly. The point that "training" an AI and "training" yourself is somehow similar will get you laughed out of court. Incidentally, the point is a purely philosophical one at this time. Science still has no idea how a human mind works. Physicalism is belief, not Science.

        • If physicalism were not true, then no one would have anything to worry about with these very physical AIs, would they, because they couldn't possibly think, and synthesize diverse information and turn it into abstract knowledge, and create by rearranging it flexibly and experimentally.

          But the fact is they are doing those things, and they will get much better at it over the immediate upcoming years. The notion that physicalism is only belief, not science, in this day and age, is what deserves to be laughed o
          • by gweihir ( 88907 )

            Complete nonsense. AI does not "think" and the law is spot-on. That _you_ are deeply in a quasi-religious delusion is your defect. Incidentally, your "reasoning" is circular, just the same as the theist fuckups like to use.

            • Actually, the hypothesis that two completely dissimilar mechanisms + structures could both result in behaviour as complex and specific as, for example, visual-input object classification and spatial decision making (Tesla FSD "pixels to generalized car movement control" and animal "optic light receptor cell signals to animal movement control") is what is nonsense.

              Clearly there has to be substantial functional similarity between the two mechanism+structure instances. Many details will differ, but in an emer
              • Actually, the hypothesis that two completely dissimilar mechanisms + structures could both result in behaviour as complex and specific as, for example, visual-input object classification and spatial decision making (Tesla FSD "pixels to generalized car movement control" and animal "optic light receptor cell signals to animal movement control") is what is nonsense.

                Clearly there has to be substantial functional similarity between the two mechanism+structure instances. Many details will differ, but in an emergent systems functionality sense, there has to be substantial equivalence. Since the extremely complex problem the two things are solving is essentially the same.

                Every time someone preaches this line of thought I am amazed at how they never seem to recognize they're making the same basic argument as Biblical Creationists who bring up "irreducible complexity" and "the odds against Earth being in exactly the right solar distance and having a moon and ionosphere etc etc" and "all the things that had to come together just right to produce the amazing human being in this amazing physical environment", as evidence that therefore the whole thing MUST have been Designed by

                • You've mischaracterized my argument.
                  Nowhere did I talk about irreducible complexity. Or a creator. Your words not mine.

                  I simply say that if there is a problem that is incredibly complex AND quite specific, it is most likely that the (also very complex, evolved or designed) solutions to that problem are functionally very similar. This is a notion that is consistent with the well known phenomenon of convergent evolution. Octopus eyes and other animal eyes for example. Functionally equivalent, yet evolved comp
    • Its not a problem for you to train yourself on publicly available videos. That doesn't mean you can take copies of those videos and train other people with them, its not your content. Unfortunately I have to make training material from time to time, and unless we have rights to the IP (to legal's satisfaction), I can't use it in the training material. If its publicly available on the internet, that only gives you permission to view it (excluding the ways that they could intentionally release IP for any use)
  • So employees essentially had their AI entity consume freely available streaming videos and learned stuff from them? Seems normal to me. I don't even see the problem with getting a Netflix sub and doing the same thing. If the tools get good enough, cost of production should drop tremendously. The possibilities are both inspiring and terrifying for our species.

  • by JustNiz ( 692889 ) on Thursday July 25, 2024 @10:48AM (#64654712)

    I mean it's fine for humans to freely watch those things.
    Why is it a problem if a machine does?

    • I would say it's less of a problem as the size and diversity of the training set increases. With too small and uniform a set, you're making a system that produces poor copies of recognizable IP. Once you have a honkin' big set with a lot of variation in it, the specifics that someone might rightly call 'theirs' are sufficiently obscured that I don't think anybody has a right to complain.

      The issue here isn't ownership of the original data; it's about stopping computers from developing the capacity to outpe

      • by JustNiz ( 692889 )

        Ahh got it, thanks.
        I for sure see it as a problem with AI's that "write" code.
        My understanding is that all they're really doing is regurgitating human-written code, minus the licence, that it's seen "somewhere", and even it can't tell you where it actually got it from, so you can't even check the licence.

    • by gweihir ( 88907 )

      Because machines are not humans. Maybe stop making stupid claims?

      • by JustNiz ( 692889 )

        Where did I claim that machines are humans?
        Maybe stop making stupid assumptions?

        • by gweihir ( 88907 )

          You have to disgrace yourself even more? Nice. Well then: Maybe you missed the tiny detail that humans are treated different by the law than inanimate objects.

          • by JustNiz ( 692889 )

            You need to stop listening to the voices in your head and just respond to what I actually wrote,
            Also you need to find better ways to communicate than just being a total wanker.

  • Robots won't be doing the drudgery. It appears the shit work will be done by humans who will be grateful to have any job at all. Meanwhile, all the cool & fun jobs like making music, making any type of art (photos, videos, animation) for entertainment, or any type of creative writing will be performed by AI. Get back to work, Slave. How else are you going to pay off all this debt previous generations have racked up !?
    • Robots won't be doing the drudgery. It appears the shit work will be done by humans who will be grateful to have any job at all. Meanwhile, all the cool & fun jobs like making music, making any type of art (photos, videos, animation) for entertainment, or any type of creative writing will be performed by AI. Get back to work, Slave. How else are you going to pay off all this debt previous generations have racked up !?

      A lot of this confusion comes from the fact that our humanities-centered educational system has given people a distorted perception of cognitive tasks. We think of the human mind as being Maslow's pyramid, with engineering and science belonging on the bottom layers because they solve the tangible problems like how to get food, how to build shelter, how to keep yourself safe from attack. Meanwhile, painting, music, poetry are up there at the top as the exalted products of advanced minds which have satisfied

      • I found your post interesting. However, I will point out that despite being programmatic and easy for AI along the lines you discuss, it's still something people want to do more than say, fairly complex manual labor like roofing a house. I bring this up only because my person recollection of the "pitch" behind automation and AI is that it'll free up humans to do more satisfying and enjoyable work and free us from drudgery. From the looks of things anything that requires moving around in the real world away
  • The Spotify Strategy (Score:5, Interesting)

    by Koreantoast ( 527520 ) on Thursday July 25, 2024 @11:03AM (#64654774)
    The AI companies are trying to do the same thing Spotify did. It was near impossible for Spotify to identify every single copyright owner to negotiate and pay royalties for every single song - they had like 90% of the music negotiated and identified through large producers but had difficulty hunting down the remaining 10% of indie publishers and independent musicians. So they actually embraced a class action lawsuit brought forward by a couple musicians. The settlement created a court approved mechanism that lets them publish everything and then the artists have to come forward and collect. It's a great deal for Spotify since they no longer have to hunt for every single IP owner; now those artists have to come to them. From a Planet Money podcast [npr.org] that discussed it with UCLA law professor Xiyin Tang [ucla.edu]:

    In the end, the class action didn't go to trial. The company and the folks who had songs in that tricky 10% ended up reaching a deal. Spotify agreed to pay them for all its past copyright infringements and set up a system to pay for streaming royalties going forward...

    So if you think about the author's lawsuit from OpenAI's perspective, maybe the lawsuit isn't the worst thing. The company has used all of this copyrighted material, allegedly, hundreds of thousands of books. There is no good way to unfeed all of those books to their AI. But also, it would be a huge pain to track down every single author and work out a licensing deal for those books. So maybe this lawsuit will let them do it all in one fell swoop by negotiating with this handy group of thousands of authors who have collectively sued them.

  • "Oh please, Mr. Trillion Dollar Corporation, can I please consume public content on the publicly available Internet? No? Don't worry, I'll make sure the use is ethical. And to prove it, here's a few million a month. Ethical! Thanks."

  • by Malay2bowman ( 10422660 ) on Thursday July 25, 2024 @11:25AM (#64654842)
    Just another money grab, hoping to get a few ducats out of the absolute vagueness and nebulous nature of how fair use vs copyright violation currently applies to using videos to train AI models. But this does not bother me nearly as much as YouTube using it's own AI to censor viewer comments on videos. All doubts of whether this is only used to stop "hateful" or "harmful" posts, and anti vax opinions went completely out the window this year when myself and so many others noticed that their comments are being automatically removed because of political content that did not follow the narrative. And that includes mentioning that mail is being stolen primarily for identity theft, and the closure of long term mental institutions 50 years ago and not requiring cities to provide adequate services to those patients who got evicted from their beds, among other things. And other people are getting their comments auto removed for something as simple and harmless as saying "I really like your video. Thank you!," or just posting a dry, technical description of something.. And if that wasn't bad enough, this bot actually tailors itself to a specific user and filters them more aggressively based on the videos that person watched, and if that person looks to be to be "too political". In short, you are a target if you think for yourself and don't just load up on cat videos all day. YT seems to think thought and words are too subversive and dangerous to their narrative.
    • ..also, YouTube does not tell you what 'sin' you committed. Instead they gaslight you into thinking your post went through, it will even remain visible to you for a while, but not to others. You can see what really happened by either using incognito mode or logging in with a different account, and checking the video's comments to see if your comment is actually there.
  • Wandering around, seeing the products of the human world (including videos, and written works, designed objects) and the natural world, with their cameras, and hearing the products of the human world (including musical pieces) with their microphones.

    And sharing some of these particulars, but mostly sharing the abstracted insights they process from the long sequence of many raw inputs, amongst each other, in a symbolic memory that is partly resident in their physical robot bodies, and partly resident in the
  • I see some claims here that an AI model learning from data is considered differently in law than a human being learning from the same data. It seems plausible to me that law might make that distinction but I would love to see some references. Does anyone have any to share?
  • Name one AI company that has *not* used training data without permission!

    • This kind of thing has been going on since the web was invented.

      The data is scraped and processed by machines because the data is out there on the open worldwide web.

      Consider: How did we get instant google search of all the open web information?

      Answer: Google used (and uses) software agents ("bots") to web-scrape all the content and indexed it for our searching pleasure.

      Did Google ask for permission to do this? No. They just followed the protocol of the day which was to not scrape any website whose web serv
  • This isn't real AI. Still where is the discussion for what the fair use of what is put out there? We learn from watching videos. Where is the fair use of machines learning from reading a copyright textbook or watching a video?

"Out of register space (ugh)" -- vi

Working...