Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
AI Technology

Getty Images is Suing the Creators of AI Art Tool Stable Diffusion for Scraping Its Content (theverge.com) 64

Getty Images is suing Stability AI, creators of popular AI art tool Stable Diffusion, over alleged copyright violation. From a report: In a press statement shared with The Verge, the stock photo company said it believes that Stability AI "unlawfully copied and processed millions of images protected by copyright" to train its software and that Getty Images has "commenced legal proceedings in the High Court of Justice in London" against the firm. Getty Images CEO Craig Peters told The Verge in an interview that the company has issued Stability AI with a "letter before action" -- a formal notification of impending litigation in the UK.
This discussion has been archived. No new comments can be posted.

Getty Images is Suing the Creators of AI Art Tool Stable Diffusion for Scraping Its Content

Comments Filter:
  • The software is open source, even if the official models get taken down by DMCA, "AI Pirates" will continue to make models and will distribute them through more uncensored means. The copyright space is getting more and more loose every year. It looks like copyright lawyers are practicing for Mickey in 2024.
    • Re: (Score:3, Insightful)

      by TWX ( 665546 )

      It's not even just AI. Pastiches are commonplace these days, and learning from existing art is how art is taught. I worked twenty years in information technology for a school district and the district boardroom and associated building's interior walls were covered in student art displays that called-out the masters from whom the students found inspiration, literally, "In the manner of Matisse," or, "in the manner of Dalí."

      If Getty made these images available on the Internet without putting them behin

      • by Travelsonic ( 870859 ) on Tuesday January 17, 2023 @10:36AM (#63216218) Journal

        earning from existing art is how art is taught.

        If Warner Bros. animators didn't imitate the Disney style, we literally wouldn't have the Looney Tunes and Merrie Melodies cartoons we know today.

        If you like learning about that stuff, KaiserBeamz' "The Merrie History of Looney Tunes" on YouTube, an ongoing series, is quite a rabbit hole to go down (no puin intended).

        • by Junta ( 36770 ) on Tuesday January 17, 2023 @11:04AM (#63216312)

          Of course, those animators actually drew new art. E.g. you wouldn't have seen Disney Logo appear in the work. You also didn't have the verbatim characters reproduced.

          AI 'learning' is a bit of an overstatement. The models have the ability to detect that part of their training set is applicable to the description and how to convincingly blend that into other elements it pulls out of its training set, but it's not making a new creative work. The article illustrates this by showing examples where the AI just plopped the Getty Images watermark right into the result. The AI doesn't "know" that watermark is not part of the image, so it just pulled that in with the rest of the assets.

          • The article illustrates this by showing examples where the AI just plopped the Getty Images watermark right into the result. The AI doesn't "know" that watermark is not part of the image, so it just pulled that in with the rest of the assets.

            No no no, the AI was just drawing images in the style of Getty [slashdot.org].

            (Sarcasm, as always.)

          • It may draw a watermark in the same style but the model does not store images. That's not at all how it works.

            • You may convince a technically illiterate rube of that, but anyone with even minimal technical understanding sees that the encoding may obfuscate the fact, but the fact remains that the "AI" produces blended amalgamations of existing images. It does not "learn how" artists paint. It just rearranges what artists have already painted.
              • by lsllll ( 830002 )
                For it to rearrange what artists have already painted, doesn't it need constant access to the original images, or the need to have saved the original images? I think you're giving some of these AIs too little credit.
                • The "learning" that an AI does is not "acquiring a skill". It's memorizing. The pieces of the original images are encoded in the "model" which is the result of the training. When you ask an AI to paint something, it literally pieces together things that it "remembers".
                  • *You* acquiring a skill is memorising. All youâ(TM)re doing is training weights and biases in your brain over hundereds of iterations of looking, and trying.

                    • That is a misrepresentation of the way humans learn. You do not learn intricate patterns of pixels and you do not paint by piecing together memorized patterns. An artist can learn to paint in the style of van Gogh without memorizing a single of his pictures in enough detail to reproduce a piece of it in a recognizable way.
                  • The "learning" that an AI does is not "acquiring a skill". It's memorizing.

                    What do you think acquiring a skill is?

                    Those who adopt your point of view will be forced to argue in court for the existence of an immortal soul, if they pursue this line of reasoning.

                    • Same mistake as the other troll. Acquiring a skill is not the same as memorizing things and putting them together in different ways. Are all of you getting through life by strictly imitating other people, like literally imitating motions without understanding purpose or even just cause and effect?
                • by Junta ( 36770 ) on Tuesday January 17, 2023 @04:28PM (#63217494)

                  It doesn't save the original images verbatim as a whole, it saves chunks of the original images. You don't have to copy an *entire* work verbatim to run afoul of copyright.

                  Anyone who has worked with "AI" knows that the training data "leaks through" in very recognizable, obvious chunks. Not "Hey, they drew a picture in the same style as some other picture" but "hey, that has bits of the some other picture pasted in". Same with text, the models frequently dump out very distinct chunks of text verbatim from the training dataset.

                  This is a huge reason to be wary of using these approaches except in cases where you *know* the entire dataset is licensed/copyrighted in a fashion allowing for this sort of manipulation. If the AI code generation is trained on open github projects, brace yourself for potentially violating licenses. Same here, if AI has chewed on Getty images or shutterstock... brace yourself for having your output land you on the wrong side of a copyright case...

                  • by lsllll ( 830002 )

                    It doesn't save the original images verbatim as a whole, it saves chunks of the original images.

                    Are you sure about that? I do not believe that's the case. My understanding is that AI like Stable Diffusion breaks an image down to pixels to learn the relationship between those pixels and create contexts, then later when it's provided with contexts it'll reverse diffuse what it has come up with until it gets an image. You make it sound like it's a piece of software that's good at photoshopping a bunch of sections cut out of other images. If by sections, you mean down to the pixel level, then I guess

                    • by Junta ( 36770 ) on Tuesday January 17, 2023 @05:28PM (#63217706)

                      Ultimately that 'learned relationship between pixels' amounts to an encoding. A very lossy encoding with some interesting hooks for the model to be able to reference when and how it should be combined with others, but nonetheless an encoding.

                      It's not literally opening up Photoshop and copy/pasting, but it is combining the contexts in a way analogous to how that would work. The visualizations referenced in the article illustrate this principle pretty clearly.

              • by ceoyoyo ( 59147 )

                Seems like the illiterate rube is convinced of the opposite, actually.

            • by gillbates ( 106458 ) on Tuesday January 17, 2023 @01:59PM (#63217002) Homepage Journal

              Storage of the images is not required for a copyright claim.

              Getty needs only to establish that Stable Diffusion, A.) copied the work without their permission, and B.) said copying affects the commercial value of the work.

              The first part establishes copyright infringement, and the second part negates the fair use defense. There is no necessity of distribution of copyrighted works to establish infringement of copyright. (Indeed, the RIAA lawsuits showed that they didn't need to prove that distribution had occurred to win a copyright claim - the copying was enough.)

          • Please define creativity. No matter how you define it, if it becomes law I can use it as a bludgeon on other artists to make sure that if their style even remotely resembles one in Getty's portfolio, it will be going to court.

            I really think artists should be very careful what they wish for.

            • by Junta ( 36770 )

              Getty isn't characterized by a 'style'. We aren't talking about visually similar, we are talking about being able to verbatim identify copy/pasted chunks of the source material.

              There's a lot of room for ambiguity when it comes to law and creative works, but copy/paste chunks of other pictures with a touch of photo editing to blend it with the rest of the picture would universally be seen as running afoul of copyright, unless there's some parody or fair use exemption in play. This is effectively how somethi

          • The article illustrates this by showing examples where the AI just plopped the Getty Images watermark right into the result. The AI doesn't "know" that watermark is not part of the image, so it just pulled that in with the rest of the assets.

            IMO to not acknowledge that Getty has public domain images with their watermarks all over them on their site would be kind of problematic, as that phrasing implies it's all "their" images having watermarks on them, and ignoring Getty's dodgy past with trying to claim licensing rights, if not outright ownership, over public domain works.

      • Most likely, if you want to look at the art online to learn from it, there's no copyright issue. It's only when you go mass scrape the site that you have broken the T&C and therefore are now subject to copyright infringement.

        For example, if you download an image and save it to your hard drive, then make other images from it using an AI, you almost certainly infringed. Because AI cannot by law be a creator, an AI probably also cannot creatively transform a work, which would at least protect you from non-

        • by TWX ( 665546 )

          And what if the scraping-process simply analyzed each image for its constituent elements at several levels (think image software and filters for things like finding edges, mapping color palettes, facial recognition) where the original image could not be reconstructed from the analysis data, and then built images off of that analysis data instead?

          • If you never put one of the generated images online, it's probably fine. That's personal use and it's covered by fair use. But if you use that tool to generate derivative works then publish those, you may be infringing. The rule is written such that this is ok only if you have significantly transform the work in a creative way.

            For example, if you make a tool that scrapes all of Getty's images, then processes each one by adding a picture of your dog using a script, then this is almost certainly a violation o

        • you have broken the T&C and therefore are now subject to copyright infringement.

          But wouldn't this conflate a ToC violation, and copyright violation?

          I mean a ToC violation can be a copyright violation, but that doesn't automatically mean ToC violations can constitute copyright infringement. For example, if a licensing agreement for a movie prohibits copying bits and pieces, and I use small bits in a review, sure it'd need to be litigated to be certain, but this is a classic, bona fide example given of fair use.

          • What I mean is that your activity - which might be construed as copyright infringement by the courts otherwise - may be protected by the T&C. But as soon as you break the T&C, you lose any extra protections it may have granted you.

        • by lsllll ( 830002 )

          Most likely, if you want to look at the art online to learn from it, there's no copyright issue.

          I don't thinks it's an issue of "learn" as much as it is style. Boris Vallejo [duckduckgo.com] has a very distinct style, as does Marc Chagall [duckduckgo.com]. If you go look at those and then paint something in one of those styles, you'd still be infringing, just like if you as much as use the first ten consecutive notes from Stairway to Heaven it'd be considered infringing.

          I've played with AI image generation. If you prompt it to "draw me a painting of two cats playing chess", it will and will use its knowledge to create the painting.

        • by ranton ( 36917 )

          It's only when you go mass scrape the site that you have broken the T&C

          US law currently allows screen scraping, regardless of T&C restrictions attempted by the site owner. Which makes sense, since a screen scraper doesn't do anything different than a web browser. It is just an HTTP request, and then the web server sends the content it wishes to. The client needs to store the text and images on the client either way.

          • Under copyright law you can scrape. But under copyright law you also cannot send that image to an online tool for analysis.

            • by ranton ( 36917 )

              Under copyright law you can scrape. But under copyright law you also cannot send that image to an online tool for analysis.

              What leads you to believe that? Copyright law just says no one can publish that image to make money off of it. An artist, art critic, student, AI researcher, etc. can do whatever analysis they want to on that image.

      • If Getty made these images available on the Internet without putting them behind some kind of login then they published them for all to look at. Undoubtedly some human art students have taken inspiration from what they've seen, and then turned around and created art based on that. If the plaintiff cannot demonstrate relationships between original and new works that show the exact same arrangement of elements, or cannot demonstrate how the AI went about generating its art, then it may be a tough case to actually prove that the AI is simply copying and then editing the existing images.

        Do they have to prove the relationship? I'm not sure Fair Use (or Fair Dealing in the UK) allows images to be downloaded, preprocessed, and then used as input for a for-profit ML model.

        This is going to be a deeply influential ruling either way as this new class of generative models really depends on those kinds of data sources. Not to mention other forms of big data scraping.

    • by quantaman ( 517394 ) on Tuesday January 17, 2023 @11:54AM (#63216516)

      The software is open source, even if the official models get taken down by DMCA, "AI Pirates" will continue to make models and will distribute them through more uncensored means. The copyright space is getting more and more loose every year. It looks like copyright lawyers are practicing for Mickey in 2024.

      Not necessarily, the tough part isn't the software (at least once it's written), the tough part is building the massive datasets [springboard.com] and then getting the giant cluster to train the models [openai.com].

      If the models themselves get out then certainly anyone can run them, but no one is going to recreate those models without a big team and some very deep pockets.

    • by ranton ( 36917 )

      The government won't even allow its copyright laws to be interpreted in a way which will significantly restrict AI research and development. The US is not going to want to cede technological superiority in this sector to other countries with less restrictive copyright laws.

      Using copyrighted material to train AIs is not commercial activity. It is training and education. Selling the results of the AI's work is commercial activity. I will be very surprised if future legal cases result in anything other than th

    • The images are not open sourced. This is not about “AI” (which by the way, doesn’t exist and is a bullshit term). This case is about a company illegally accessing copywrite protected information to create baselines in it’s algorhythm. You can not take other peoples copyrighted work and build on it.
      • You've never heard of fair use, have you?

      • > The images are not open sourced.

        Seems completely irrelevant to me.

        Any artist can go to the Getty site, or any other site, that is showing even just thumbnails, much less the larger watermarked versions that are more typical, and brew up derivative ideas. Almost all visual art is highly derivative. Likewise music, another form of art. Considerable written fiction. Etc.

        ML applications like Stable Diffusion are doing exactly this. No more, no less. They're not copying works. They're creating derivative wo

      • illegally accessing

        Allegedly. Alleging a claim is true doesn't automatically make it so, that's what the litigation is for.
        That is the sort of overconfidence that lead to Sony being beaten twice legally on emulation in the US, and the music industry failing to take on decentralized p2p clients like Grokster & bitTorrent clients.

    • Theyâ(TM)re not AI pirated anyway. Copyright bans the distribution of the artwork. The AI isnâ(TM)t distributing the artwork, and nor are the creators of the AI.

  • by Travelsonic ( 870859 ) on Tuesday January 17, 2023 @10:33AM (#63216214) Journal
    I mean, not in terms of scraping IMAGES obviously, but the scraping of publicly available data for analytical purposes?

    This is als rich coming from a company that appears to attempt to license out/claim the ability to license public domain works (look at the licensing plan options existing on public domain works that are on their site/smeared with their logo feces), and whose subsidiary tried to claim rights over a public domain image of the coronavirus.
  • I've downloaded (in my web browser) & processed (looked at) thousands of copyright images on the interwebs pipes. Who do I need to contact to turn myself in? Where should I go?
    • That's fair use. No issue here.

      But I assume you didn't then put those images up on a website for others to see, which would clearly be copyright infringement.

      • Not exact copies. I reproduced something a bit like a lot of them kinda mixed up together. I call it inspiration. Does that count as copyright infringement?
        • It depends. The courts decide on a case by case basis whether the "inspiration" counts as a creatively transformative work. For example, if you take 4 Andy Warhols and you put them in a 2x2 tile, that's clearly infringement. For something a bit more complicated, see the discussion here on photo mosaics: https://www.avvo.com/legal-ans... [avvo.com]

  • If you can't claim copyright through AI usage you shouldn't be liable for infringement from it either.

    it's a joke, meta-mod, give me AI moderation please!
    Open the door please Hal.

    Shalmaneser save us! Can you still Stand On Zanzibar?

  • Just watching and learning.

    • by leptons ( 891340 )
      It's definitely also copying.
    • However, Getty Images is copying and selling Public Domain content they don't own. Getty scrapes the Library of Congress and slaps-on their own copyright. Shorpy, Alamay, and others do the same. I don't understand how it is legal for these companies to steal photographs from us, the public. They need to offer Public Domain photos as a free public service or be sued out of business.
  • by dmay34 ( 6770232 ) on Tuesday January 17, 2023 @11:23AM (#63216380)

    Just having seen and studied prior art does not make all future art produced by an artist infringement.

  • I wrote about copyright vs AI [aardvark.co.nz] a day or two ago and pointed out that copyright issues are going to be a real problem with the way AI systems are being trained and with the results they're producing:

  • No Disassemble, Getty Images!!

"Why can't we ever attempt to solve a problem in this country without having a 'War' on it?" -- Rich Thomson, talk.politics.misc

Working...