Google's SoundFilter AI Separates Any Sound or Voice From Mixed-Audio Recordings

Google's SoundFilter AI Separates Any Sound or Voice From Mixed-Audio Recordings (venturebeat.com) 28

Posted by msmash on Friday November 13, 2020 @02:07PM from the closer-look dept.

Researchers at Google claim to have developed a machine learning model that can separate a sound source from noisy, single-channel audio based on only a short sample of the target source. In a paper [PDF], they say their SoundFilter system can be tuned to filter arbitrary sound sources, even those it hasn't seen during training. From a report: The researchers believe a noise-eliminating system like SoundFilter could be used to create a range of useful technologies. For instance, Google drew on audio from thousands of its own meetings and YouTube videos to train the noise-canceling algorithm in Google Meet. Meanwhile, a team of Carnegie Mellon researchers created a "sound-action-vision" corpus to anticipate where objects will move when subjected to physical force. SoundFilter treats the task of sound separation as a one-shot learning problem. The model receives as input the audio mixture to be filtered and a single short example of the kind of sound to be filtered out. Once trained, SoundFilter is expected to extract this kind of sound from the mixture if present.

Google's SoundFilter AI Separates Any Sound or Voice From Mixed-Audio Recordings

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 28 Comments Log In/Create an Account

Comments Filter:

Combine with deepfake tech (Score:1)

by no0ne108 ( 7412068 ) writes:

What could go wrong?
So now (Score:5, Insightful)

by Rosco P. Coltrane ( 209368 ) writes: on Friday November 13, 2020 @02:13PM (#60720378)

even if you put the radio on to cover your discussion, the Google surveillance collective can still make out what you're saying.
Everything that company does is creepy on one lever or another...

- Re: (Score:2)
  
  by zenlessyank ( 748553 ) writes:
  
  Just move the fulcrum.
- Re: (Score:3)
  
  by omnichad ( 1198475 ) writes:
  
  Amazon: Let's just put a nice microphone array in our smart speakers so we can always hear what you're saying.
  Google: Let's throw the cheapest hardware out there and just fix it with software
If it does what I think it does, this is real AI (Score:4, Insightful)

by mark-t ( 151149 ) writes: <markt AT nerdflat DOT com> on Friday November 13, 2020 @02:44PM (#60720566) Journal

The ability to pick out a single speaker from a cacophony filled room with lots of other people talking or making noise such as at a night club or party is actually quite an incredible trait that humans have. If this software can do that from a short audio sample of the target, I'd call that real AI.

- Re: (Score:2)
  
  by ljw1004 ( 764174 ) writes:
  
  The ability to pick out a single speaker from a cacophony filled room with lots of other people talking or making noise such as at a night club or party is actually quite an incredible trait that humans have. If this software can do that from a short audio sample of the target, I'd call that real AI.
  I can't do it myself. I'm 45 years old now and have never been able to follow conversations in night clubs, loud parties, nor even most restaurants when I'm in a group and am trying to hear the people on the other side of the table. I can hear quiet sounds well - my brain just doesn't convert the sounds into words. I suspect normal people do better than me maybe because they subconsciously lip-read a little, or maybe because they're used to filling in the blanks much more than I do.
  - Re: (Score:2)
    
    by Cederic ( 9623 ) writes:
    
    That may mean you have an underlying issue - e.g. https://www.healthyhearing.com... [healthyhearing.com] (which obviously doesn't mean that you're autistic; this and other hearing issues can happen as part of, or independent from, that).
    I have a similar issue, plus mild deafness, so I do tend to benefit from seeing the lips of the speaker too.
- Re: (Score:2)
  
  by bill_mcgonigle ( 4333 ) * writes:
  
  Jungle animals can usually exhibit some type of Cocktail Party Effect too.
  Ours is tuned for our language but hearing mating calls over the din has reproduction advantage too.
- Re: (Score:1)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- No, it isn't. (Score:2)
  
  by SinGunner ( 911891 ) writes:
  
  "Real" AI understands what it's doing. What exactly do you think this is doing that makes it more "real" than picking a stop sign out of a picture?
  - Re: (Score:2)
    
    by mark-t ( 151149 ) writes:
    
    Do *YOU* understand exactly what you are doing when you can pick out a single voice in a loud room?
    - Re: (Score:2)
      
      by SinGunner ( 911891 ) writes:
      
      No, because it is not an act involving intelligence. There is nothing I could study in an intellectual capacity to improve this auditory faculty. It is no more indicative of intelligence than being able to tell green light from red light. Now if the AI did something like, I don't know, create a program to help it separate out the different voices? That would be an act of intelligence, which is what the humans here have done. This is just an application of machine learning.
      - Re: (Score:2)
        
        by mark-t ( 151149 ) writes:
        
        It appears you have invented a rather arbitrary definition of intelligence that has no bearing on how it is ordinarily defined by the rest of the world.
        Every creature on the planet with a nervous system is intelligent, but your criteria for intelligence would preclude that.
        
        Re: (Score:2)
        
        by SinGunner ( 911891 ) writes:
        
        The definition of intelligence is hotly debated, but it is almost universally agreed that an evolutionary algorithm (as used here) does not remotely qualify as intelligent. Intelligence tends to be considered a proactive quality, whereas machine learning and even a nervous system are only reactive. Given the fact that you haven't made the slightest effort to explain how you feel this latest advancement differs from the basic pattern matching that machine learning uses for images, I'm beginning to doubt yo
        
        Re: (Score:2)
        
        by mark-t ( 151149 ) writes:
        
        It differs from basic pattern matching in that it requires only a small sample of the target to be listened to, and not an exhaustive database of everything that it needs to filter out or listen for.
        
        Re: (Score:2)
        
        by SinGunner ( 911891 ) writes:
        
        But that's exactly how machine learning works. Creating the model takes tons of data, but once it's created it can be used on a small sample. There doesn't appear to be anything special going on here. This doesn't even strike me as an iterative improvement, more like an advertisement for the latest Google product.
        
        Re: (Score:2)
        
        by mark-t ( 151149 ) writes:
        
        I don't know if you realized this or not, but natural learning takes a ton of data too.
        
        Re: (Score:2)
        
        by SinGunner ( 911891 ) writes:
        
        Evolutionary learning does, yes. But "intelligent" learning allows us to bypass much of the brute force nature of "natural" learning. Which is why this isn't remotely "Real AI", which has been my point from the start. I could have never been in a kitchen in my life but still figure out how to make a sandwich relatively easily. I wouldn't need to run an evolutionary model based on thousands of ways to stack breads, meats and cheeses that is then routinely critiqued by a sandwich maker (which is how machi
        
        Re: (Score:2)
        
        by mark-t ( 151149 ) writes:
        
        And just how long do you think it takes someone who is not a linguist savant to learn a language without having any prior exposure to it?
        My point being that it generally requires far more than just a small sample. Yet once a language is known and understood (vast amounts of data), you can effectively communicate a concept in that language simply by knowing what concept to communicate (small amounts of data).
        Or do you believe that the ability to coherently communicate ideas is not a demonstration of in
        
        Re: (Score:2)
        
        by SinGunner ( 911891 ) writes:
        
        Holy hell, you think this has something to do with language processing? I didn't realize how far off your base understanding was. This matches patterns like vocal pitch and, at best, cadence. You should read the Wikipedia articles on machine learning and neural networks. Processing a voice is a million miles away from anything remotely related to language.
        
        Re: (Score:2)
        
        by mark-t ( 151149 ) writes:
        
        Not at all... I am saying that virtually everything that you ever learn how to do takes a shitpile of data input before you know how to do it correctly. Even your example of hypothetically making a sandwich without having ever seen a kitchen would itself draw on a vast breadth of knowledge acquired from numerous sources, most of which would probably not even be consciously aware of.
        As it happens, I picked something else also related to listening... it was not a deliberate choice.
        
        Re: (Score:1)
        
        by freedizzle ( 6552682 ) writes:
        
        > I could have never been in a kitchen in my life but still figure out how to make a sandwich relatively easily This sounds like a pretty rigged test of intelligence if you ask me. Same deal for your assertation that writing software to separate patterns in audio streams is a sign of intelligence also. If your yardstick for intelligence is a piecewise function that cuts off at anything less than yours... well then, don't worry about me, I'm not intelligent, you can have it all. :)
Most importantly (Score:3)

by TomR teh Pirate ( 1554037 ) writes: on Friday November 13, 2020 @02:58PM (#60720648)

This could be massive step forward for karaoke

- Re: (Score:1)
  
  by Jemm ( 747958 ) writes:
  
  There already are excellent systems to remove voices from songs.
DMCA proofing videos? (Score:2)

by bjwest ( 14070 ) writes:

Can it remove the music from Twitch VODs? Too bad everyone's already deleted them all though.
blind signal separation (Score:2)

by nsaspook ( 20301 ) writes:

http://www.irisa.fr/metiss/oze... [irisa.fr]
Another use (Score:2)

by spaceman375 ( 780812 ) writes:

This tech could pull each of the instruments out of an old, low-res recording separately. Remix the tracks, and you've got a cleaner recording than the original low tech tapes. They are already doing this for video, upgrading movies from long ago to 4k or better. This fits right in with that toolbox. They should do the original Jetsons theme just to celebrate, right after rescuing something like the earliest Edison cylinders. They could even clean up a recording of the recording so as not to play the delica
This has potential for the hearing impaired. (Score:1)

by Jemm ( 747958 ) writes:

and anyone watching a Nolan film.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Google's SoundFilter AI Separates Any Sound or Voice From Mixed-Audio Recordings (venturebeat.com) 28

Google's SoundFilter AI Separates Any Sound or Voice From Mixed-Audio Recordings More Login

Google's SoundFilter AI Separates Any Sound or Voice From Mixed-Audio Recordings

Combine with deepfake tech (Score:1)

So now (Score:5, Insightful)

Re: (Score:2)

Re: (Score:3)

If it does what I think it does, this is real AI (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

No, it isn't. (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Most importantly (Score:3)

Re: (Score:1)

DMCA proofing videos? (Score:2)

blind signal separation (Score:2)

Another use (Score:2)

This has potential for the hearing impaired. (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot