Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
AI

Meta Has Created a Way To Watermark AI-Generated Speech (technologyreview.com) 64

An anonymous reader quotes a report from MIT Technology Review: Meta has created a system that can embed hidden signals, known as watermarks, in AI-generated audio clips, which could help in detecting AI-generated content online. The tool, called AudioSeal, is the first that can pinpoint which bits of audio in, for example, a full hourlong podcast might have been generated by AI. It could help to tackle the growing problem of misinformation and scams using voice cloning tools, says Hady Elsahar, a research scientist at Meta. Malicious actors have used generative AI to create audio deepfakes of President Joe Biden, and scammers have used deepfakes to blackmail their victims. Watermarks could in theory help social media companies detect and remove unwanted content. However, there are some big caveats. Meta says it has no plans yet to apply the watermarks to AI-generated audio created using its tools. Audio watermarks are not yet adopted widely, and there is no single agreed industry standard for them. And watermarks for AI-generated content tend to be easy to tamper with -- for example, by removing or forging them.

Fast detection, and the ability to pinpoint which elements of an audio file are AI-generated, will be critical to making the system useful, says Elsahar. He says the team achieved between 90% and 100% accuracy in detecting the watermarks, much better results than in previous attempts at watermarking audio. AudioSeal is available on GitHub for free. Anyone can download it and use it to add watermarks to AI-generated audio clips. It could eventually be overlaid on top of AI audio generation models, so that it is automatically applied to any speech generated using them. The researchers who created it will present their work at the International Conference on Machine Learning in Vienna, Austria, in July.

This discussion has been archived. No new comments can be posted.

Meta Has Created a Way To Watermark AI-Generated Speech

Comments Filter:
  • by Kiliani ( 816330 ) on Tuesday June 18, 2024 @10:48PM (#64559823)

    I believe it when I see it.

    • by Tom ( 822 )

      Hear it. Not see it.

    • Re:Really? (Score:4, Informative)

      by Rei ( 128717 ) on Wednesday June 19, 2024 @05:53AM (#64560327) Homepage

      It's going to fail to one of the same problems as Glaze: trivial removabilty. These people never bother to red team their "solutions". For example:

      ffmpeg -i watermarked_audio.flac -af "arnndn=m='one-of-ffmpegs-noise-removal-neural-networks'" audio_with_watermark_removed.flac

      I can almost guarantee you that will destroy their watermark, *and* improve your audio quality at the same time.

      • Re:Really? (Score:4, Insightful)

        by DarkOx ( 621550 ) on Wednesday June 19, 2024 @06:56AM (#64560427) Journal

        Right, I don't understand how water marking is expected to resist such attacks for any amount of time.

        For the mark to be useful it has to be reliably identifiable. If you can spot it naturally you are going to be able to remove it and you should be able to fill gaps left with the same ML technology we use now to 'fix' damaged audio and images. If stripping of the mark removes so much data that can't be done, then the mark was so aggressive it won't be acceptable for use in the first place other than perhaps for things like official documentation / records; and for that case you'd just digitally sign.

        • not to mention you have to rely on that AI companies to add a watermark to the generated audio in the first place.
        • by HiThere ( 15173 )

          I sort of expect it would be something like stairstepping the middle part of sounds increasing in volume. That would be removable, but not trivially so. And with a fine-grained step size it wouldn't be obvious to anyone listening. Or you could do something similar with frequency shifts.

          If you want to preserve the watermarks, you'd need to do something like using codecs that signed the stream (which could be done outside the frequency range of human hearing).

    • All videos of Biden are fake.
      • Yeah, the obvious black hat move here is to apply the "made by AI watermark" to real video.

        Actual black hat content creators aren't going to watermark their output.

        It's sort of amazing that with machine learning tech long gone out in the wild, that people keep trying to appear credible when they do things like bowdlerize, de-celeb, watermark (which is inevitably going to be reversible as with every digital watermarking tech, ever... in fact, it'll be mildly hilarious when machine learning is used to remove

  • by scrccrcr ( 1985186 ) on Tuesday June 18, 2024 @10:53PM (#64559829)
    then programs can be written to remove it
    • by Logger ( 9214 )

      So much time, effort, and resources spent avoiding the real solution of cryptographically signing everything.

      • by dinfinity ( 2300094 ) on Tuesday June 18, 2024 @11:12PM (#64559861)

        Agreed. The challenge is not to detect fakes, but to know what is real.

        • No, the challenge is trustworthiness. Something that digital signatures and cryptography is ill suited to address.

          Why? Digital signatures only indicate whether or not something has been altered since the signature was made. They cannot indicate whether or not the person or thing who made the signature is trustworthy. That requires far more context than what a digital signature can provide alone. (Is the person / thing acting of their own will? Has subterfuge occurred? Was the original sabotaged before the
          • We're all on the same page here. Neither GP nor me stated that digital signatures were enough by themselves to know what is real. Remember that the thing we're arguing against is watermarking fake shit as a 'solution' to a rapid increase in fake content.

            Take your real relationships and bonds of trust, but do not include cryptographic signing in the solution. Will it work? Add in the AI watermark crap to the approach if you want.
            Grandma gets a video from her real daughter she sees and talks to every week, wi

            • The problem with the 'detect fakes' approach is that you can have a detector the works 100% on Monday and which somebody learns to fool some of the time on Tuesday. You get a video sent to you on Wednesday. How sure can you be that it is real?

              Answer: It requires knowledge and skill to detect the fake, as well as the vigilance required for such scrutiny to be regularly applied. Something that no static set of instructions without the capability to reason is capable of. As you've already shown in your hypothetical.

              Compare that to the situation with cryptographic signing: When you get a signed video, you know that some pretty tricky shit has had to have happened (stolen private key) for it to be not from that person.

              Or that someone got lucky and picked an algorithmically equivalent number before the heat death of the universe.

              Or that some bug in the program / OS / firmware / etc. patched out the "reject faulty signature" code and replaced it wit

              • Answer:

                That was not an answer, but a vaguely related comment. It's a simple question: How sure can you be that it is real?

                Or that someone got lucky and picked an algorithmically equivalent number before the heat death of the universe.

                Don't be obtuse. You're not a teenager.

                Or that some bug in the program / OS / firmware / etc. patched out the "reject faulty signature" code and replaced it with the "accept valid signature" code.
                Or that some bug in the program / OS / firmware / etc. intercepted the error message and replaced it with a "everything is fine" message.

                These two are the same thing, but fair enough. That is indeed a possible failure case. One that I would put very much in the "pretty tricky shit has had to have happened" category. I never said that digital signatures are perfect or unbreakable.
                See: https://en.wikipedia.org/wiki/... [wikipedia.org]

                Once again, you're treating the digital signature as if the trusted person themselves has signed it and handed it to you directly with no intermediaries.

                Wrong. I was and am saying that "detecting fakes" is a far inferior soluti

      • "Cryptographically signing everything" makes as much sense as "signing every sketch I draw and back-of-the-napkin note I write".

      • by DarkOx ( 621550 )

        Digital signatures don't really solve this problem.

        The problem people want to solve here is keeping the statement 'Pics(or audio etc) or it didn't happen' truth-y.

        (I am saying either of these events actually happened or any group made such edits, just using a 'ripped from the headlines' type story/example here)

        A video drops that appears to show something one faction really wants to be true and another really wants to be false. - Like say 'see the President is a vegetable' One "recording" is released by a f

      • by HiThere ( 15173 )

        Yeah, but the signatures need to be enforced the the codec, and the ones in general use don't even recognize them. Good luck getting people to switch. It's probably doable, but it would require substantial investment, and probably a decade or so of time invested. Watermarks are an easy and quick partial solution. (How big a part? It's going to be a game of whack-a-mole.)

        • The codec cannot protect against a compromised key. Nor can it protect against a falsified (staged) image. Nor can it even protect itself. (The code is running on a device that may not be trustworthy, or a device that was compromised by it's owner.) Demanding that the codec be the arbiter of trust is just demanding that someone else lie to you about the trustworthiness of something so you can claim ignorance when your laziness inevitably bites you in the ass.
    • by Shaitan ( 22585 )

      If it can be detected for validation then it can be detected for removal [or effective removal by scrambling so it fails detection]... of course it might damage the content.

      If only there were some technology which was good at pattern matching and content generation which could detect the watermark artifacts and generate plausible new fakes to cover it up...

      • by Rei ( 128717 )

        Yeah, every watermarking and adversarial noise problem runs up against the fact that generative AI can inherently replace whatever low-level imperceptible details you add with different low-level imperceptible details - and it doesn't even need to be optimized to an anti-watermarking / anti-adversarial-noise role to do so.

        Let alone the fact that you can inherently train neural networks to *specifically* denoise media.

    • Why remove it when nothing compels a propagandist to include watermarks in the first place?
    • by hAckz0r ( 989977 )

      Simple, you decompress the audio stream and re-compress it. If a lossful compression algorithm is used just once then the watermark will no longer authenticate that as the original work. So you can use this kind of watermarking to authenticate an original work but not to mark all AI generated work as unreliable or faked. Removing it removes the proof that the audio is real.

      The best way to leverage this is to have all Non-AI audio watermarked by the author as proof of pedigree. Anything not marked is then su

    • then programs can be written to remove it

      Sure, just like you can remove pee from a public pool.

      It is entirely possible to write transforms that survive extreme editing...

  • by Ksevio ( 865461 ) on Tuesday June 18, 2024 @11:05PM (#64559855) Homepage

    This is as useful as creating software to add watermarks to images to detect photoshops.

    It might have worked if it was implemented into software originally as some sort of standard and validation, but now there's open source software out there that can mix audio and train models on voices that anyone can use.

    • by AmiMoJo ( 196126 )

      It doesn't have to be perfect to be useful. Most users aren't able to set up their own local open source AI voice generation software, they will go to some website recommended by a Facebook ad. Even if it only makes 10% of these fakes detectable, that's still an improvement over 0%.

  • by iAmWaySmarterThanYou ( 10095012 ) on Tuesday June 18, 2024 @11:38PM (#64559887)

    I put watermarks on all the fake clothing, purses, cash, and electronics I ship.

    Because obviously criminals and other assorted ne'er do wells are certainly going to choose the software that watermarks over the ones that don't.

    For my next trick, I will make all crime illegal!

  • by ihadafivedigituid ( 8391795 ) on Wednesday June 19, 2024 @12:48AM (#64559941)
    I was director of R&D for the original downloading jukebox company and demonstrated forensic watermarking of music files at Universal Records to one of their veeps in spring 2000 or so. We didn't develop it, but I selected the now long-forgotten solution from maybe three available choices so it wasn't super brand-new.

    Even back then it worked pretty well and survived all kinds of audio file mangling while not really trashing the file (the other two choices were either not as robust or were generate audible artifacts). Encoding time was more than doubled, IIRC, which meant that we had a bunch of 1U boxes racked up in the ripping & encoding farm to process everything in parallel. We had 15-20k CDs on the premises by late 2002 or so when encoded feeds from the labels started to appear. Nostalgia, it was a fun time.

    Glad Facebook re-discovered tech that was in use when Zuck was in high school.
    • Since you seem to have knowledge of this tech do you think it will actually work this time around?

      • Sure, at least against casual users. What we used survived a lot of mangling and remained detectable. But people didn't have the kinds of computing horsepower and advanced software we have now so it might be much easier to scrub or alter a watermark without trashing the file now.
    • You should talk to some of the folks higher up who are convinced that watermarking is useless because it can be removed with minor editing.

      lol, the naivete is adorable.

      (it actually can be removed, especially if you know how it was created, but removal is not guaranteed to be possible)

      • Minor editing is a no-go, but I'm not sure if an AI-assisted watermark scrub tool is out of the question. LLMs are great at pattern recognition, and if you knew what to tell it to look for then it might be able to remove or mask the watermark pretty quickly.
  • They don't even bother removing fake ads when those are reported.
    Who are they trying to convince?

  • How is that different from watermarking non-AI content?
  • This is dangerous. (Score:5, Insightful)

    by Shaitan ( 22585 ) on Wednesday June 19, 2024 @03:20AM (#64560111)

    If a tool can detect it there is nothing to stop an AI from being trained as an adversarial network to mask the watermarks. People will then use this for validation and audio which passes it will be treated as more trustworthy than audio which fails. Meanwhile, a smart adversary doesn't reveal their hand after cracking enigma but masks just enough fake audio to turn the tide of war else the watermarking tool would be updated.

    • If a tool can detect it there is nothing to stop an AI from being trained as an adversarial network to mask the watermarks.

      LOL. It seems sooooooo obvious that it MUST be possible. Go ahead and try it. I will wait a thousand years for your results. I give you less than a 20% chance at succeeding in those thousand years. I give you zero chance of doing it within your normal lifetime.

      I am kind of insulting you because I know there are people who have MUCH better chances at succeeding, but even their success is not guaranteed. This subject is MUCH deeper and nuanced than you even suspect... which is why I am giving the 20% chance r

      • by Shaitan ( 22585 )

        "I am kind of insulting you"

        Which is interesting since you don't know anything about me and I don't recall ever exchanging comments with you.

        "This subject is MUCH deeper and nuanced than you even suspect"

        Oh it is, is it? Fascinating, Mr Random Troll with a posting history that not only shows no indication of any particular knowledge of audio, audio watermarking, machine learning, or Artificial Intelligence but exhibits no especial expertise or insight on any subject. If you actually have some sort of nuance

        • When I said, "I am kind of insulting you", I was acknowledging that there is no way to say something negative without it being perceived as an insult to you. I am genuinely not trying to insult you, but yeah.

          Fascinating, Mr Random Troll with a posting history that not only shows no indication of any particular knowledge of audio, audio watermarking, machine learning, or Artificial Intelligence but exhibits no especial expertise or insight on any subject.

          That comment points straight back at you as you have no identifiable expertise either, so here we are.

          I had thought about writing why your view is too basic to be a valid attack against watermarking, but let's be honest, you wouldn't have read it. There are too many concepts you are unfamiliar with to e

          • by Shaitan ( 22585 )

            "That comment points straight back at you as you have no identifiable expertise either"

            On the contrary. Anyone with expertise on the topic would [and did] recognize my comment directly targeted the nuance of the subject matter when I referenced a reliable detection tool that could be used for training and adversarial network training. Further I have decades of comment history.

            "Do you recall a song where Aretha Franklin let her soul speak while uttering the letters R E S P E C T? Have you ever heard anyone e

            • Those are not watermarks and the entire point of the watermark is that a human cannot detect it at all.

              Why would it need to be undetectable by a human?

              It appears you are thinking of this issue from the point of view of, "here is an audio file, add some watermarking to it that is indistinguishable from background noise".

              Why you would want to limit yourself in such a way is beyond me.

              The watermark could be an "instrument". The instrument is a carefully crafted percussive instrument, but the sound, when examined, is actually a cryptographic construct. To you, the listener, it sounds rather like a light bongo dr

              • by Shaitan ( 22585 )

                "Why would it need to be undetectable by a human?"

                If you can hear the watermark you've destroyed the audio when watermarking it because you've added something that isn't a legitimate part of the content.

                "The instrument is a carefully crafted percussive instrument, but the sound, when examined, is actually a cryptographic construct. To you, the listener, it sounds rather like a light bongo drum that is artistically part of the music, but to a computer, it sees a repeating and resistant to compressing cryptog

                • f you can hear the watermark you've destroyed the audio when watermarking it because you've added something that isn't a legitimate part of the content.

                  *sigh*

                  It takes inspiration from the phone ring and produces a new phone ring effect which is just as good but doesn't have any bongos and it doesn't actually use any binary data from the original, particularly not your cryptographic signature.

                  *sigh*

                  Words can communicate ideas; however, the person listening has to actually process the words for communication to take place. Let's try this from a different angle.

                  The artist gets choices in how they want watermarking to occur. I used a "bongo" type sound as an example. I get it, not enough example was given.

                  Have you ever heard of the artist named T-Pain? Everyone uses autotune to make their voices sound more "perfect". He used it to sound unique. The autotuning could be done in such a way that

                  • by Shaitan ( 22585 )

                    "The artist gets choices in how they want watermarking to occur. I used a "bongo" type sound as an example. I get it, not enough example was given.

                    Have you ever heard of the artist named T-Pain? Everyone uses autotune to make their voices sound more "perfect". He used it to sound unique. The autotuning could be done in such a way that a cryptographic signature is embedded in the artists vocal track which is then subsequently mixed with the music. Good luck getting rid of that watermark."

                    Even if Hans Zimmer

                    • I'm not interested in watermarking anything but if I did want to watermark audio I would only want a tool I used for that purpose doing so.

                      Stop thinking of yourself so you can hear what is being told to you: The Artist/Creator/Originator/whateverthefuckyouwanttocallthepersonwhocreatestheoriginalaudio will be the one implementing it. They have full control over what sounds actually come out. And because it is fully integrated, not even AI can fully remove it without fully destroying and reimplementing the audio... but even then, timing is what is critical and if you change the timing, it is not really even the same song.

                    • by Shaitan ( 22585 )

                      "The Artist/Creator/Originator/whateverthefuckyouwanttocallthepersonwhocreatestheoriginalaudio will be the one implementing it."

                      Perhaps with your imaginary and completely made up fantasy watermarking system that bears no resemblance to the one in the TFA. But we aren't discussing that.

                      We are discussing... "Meta has created a system that can embed hidden signals, known as watermarks, in AI-generated audio clips, which could help in detecting AI-generated content online." with the objective that it "be overla

  • To separate AI generated content from actual content for training their own AI's.

    Meta doesn't care about rights-management, or content made by others, or even deepfakes. The most pressing reason is that you do not want to train an AI on AI generated content. They need this watermark, to avoid this. That is the reason it is also open-sourced. Expect all AI businesses to come up with some sort of watermark in the near future.

  • I'm unclear on how this is supposed to work... So how is Meta going to force people to use this technology? If they are trying to fool you with AI than they will use the AIs that don't have this watermark. What is the point of using AI if people will know it is AI? That would be like a politician pointing out every lie he is about to tell you. The whole point of current AI is to generate BS.
  • I expect ways to remove it in the near future.

  • Or sell the "service" of exempting others. Which then will lead to using AI to get around it.

    Fuck everyone who works in this area. They're 100x creepier than people who design fake plastic plants.
  • by Posthoc_Prior ( 7057067 ) on Wednesday June 19, 2024 @09:10AM (#64560787)

    I work on similar related technologies; so I read the paper:

    https://arxiv.org/pdf/2401.17264

    on how it works. I'll briefly describe how it works and why this watermark is easily broken.

    First, how it works is straightforward. It randomly selects points in the time series of the audio file and superimposes a unique classifier onto that point in the time series. That is, because of the addition of two superpositiioned waves, you get a third wave which is the addition of the two waves. Then, to verify the watermark, you can subtract one superimposed wave from the other. The key in the technology is, it seems, the ability to add it to points in the audio file that can't be detected by the human ear.

    It won't work as watermark technology for two reasons. The first is that the file can be searched from minute to minute for a watermark, This means that it's sample-level localized. In other words, if you pick any two random points in the audio file, you can use a series of random guesses to find where the watermark is in linear time. Then, once you have this first watermark, you can then find all other watermarks because they're linearly spaced (they have to be, because this is how sample-level localization works). And this is why it won't work: there is no lossless compression of the two superpositioned waves.

    That is, in order to beat the process of removing the watermark, there has to be no discernible difference between the watermark and the audio file. This can only be done with lossless compression. Since this is not part of AudioSeal, it can be assumed that if an attacker wants to remove a watermark, it can do so because they're easily identifiable.

    • by Mal-2 ( 675116 )

      Making the watermark inaudible might be as simple as putting it "out of band" like FM Stereo does. There is "room" to hide things above 14kHz, and especially above 18-19 kHz.

      • Thanks! I didn't know this.
        • by Mal-2 ( 675116 )

          There's lots of bandwidth available just short of the Nyquist frequency, all of which our ears are terrible at detecting, and that's assuming the minimum "modern" sample rate of 44.1 kHz -- probably a good assumption if you want to make sure the watermark survives transcoding under all circumstances.

    • What you are saying is that they should have left it to the professionals rather than try to roll their own. Their method is pure naive trash. Did these guys graduate college at any point?

  • The real value in doing this, as far as the AI companies are concerned, has nothing to do with the protection of end users. This is so that future AI can easily remove past-AI-generated drivel from its training input. Do you really think they're doing it for us?

  • AI-generated audio should lack any background noise/hiss, and should be easy to detect. If somebody programs it to generate such background noise, that noise might be less entropic than actual background noise.

Asynchronous inputs are at the root of our race problems. -- D. Winker and F. Prosser

Working...