Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
Google Android Music Operating Systems Software Technology

How Google's Pixel 2 'Now Playing' Song Identification Works (venturebeat.com) 129

An anonymous reader shares a report from VentureBeat, written by Emil Protalinski: The most interesting Google Pixel 2 and Pixel 2 XL feature, to me, is Now Playing. If you've ever used Shazam or SoundHound, you probably understand the basics: The app uses your device's microphone to capture an audio sample and creates an acoustic fingerprint to compare against a central song database. If a match is found, information such as the song title and artist are sent back to the user. Now Playing achieves this with two important differentiators. First, Now Playing detects songs automatically without you explicitly asking -- the feature works when your phone is locked and the information is displayed on the Pixel 2's lock screen (you'll eventually be able to ask Google Assistant what's currently playing, but not yet). Secondly, it's an on-device and local feature: Now Playing functions completely offline (we tested this, and indeed it works with mobile data and Wi-Fi turned off). No audio is ever sent to Google.
This discussion has been archived. No new comments can be posted.

How Google's Pixel 2 'Now Playing' Song Identification Works

Comments Filter:
  • works offline? (Score:3, Insightful)

    by Anonymous Coward on Thursday October 19, 2017 @09:09PM (#55400817)

    How in the actual fuck is this possible? They have an audio an audio signature of every song built in?

    • Re: (Score:1, Informative)

      by Anonymous Coward

      Sure, why not? Do you honestly think that something which amounts to a checksum takes very much space? Probably a few bytes per song.

      If we figure 32 bytes per song times 50,000 songs, that's only like 1.6MB of space needed.

      • Did you account for the titles of the songs? Names of the albums?

        • by Anonymous Coward

          Names of albums is unimportant as the same song can be featured on multiple albums. People only care about artist and song and text compresses extremely well so, to be generous, go ahead and double that value to accommodate.

          A 3.2MB local database is probably not going to be a big deal for anyone.

          • by Anonymous Coward

            As a follow up to this, I just conducted a quick test by piping a full listing of my music library into a text file.

            That's 32,078 tracks listing the FULL PATH which includes the drive letter, "Music" directory, artist name, album year, album name, disc number, track number, artist name (again), track name and MP3 file extension. A single sample looks like this:

            D:\Music\65daysofstatic\[2016] No Man's Sky - Music For An Infinite Universe\Disc 01\01 - 65daysofstatic [Monolith].mp3

            The size of the raw text file is 3.26MB, compressed as ZIP it's 515KB and compressed as 7z it's 392KB.

            • by Anonymous Coward

              Further, mist of the data in a music library consists of the song audio itself. You haven't accounted for the system's ability to identify a song based on a fragment of its content in spite of the lossy audio channel. It's got to be substantially more than just a hash and a title being stored.

              If I would venture my own guess, and let me stress that this is a guess, I would speculate that perhaps Google has devised some sort of fractal audio fingerprinting and song indexing scheme.

              • It's got to be substantially more than just a hash and a title being stored.

                Why is that? A fingerprint contains a huge amount of data, yet a fingerprint reader will condense it down into a small number of points and represent the entire thing as a digit or string of digits. Why can't a computer do the same with music, creating what is essentially a hash and matching that to a database of titles?

                • Which few seconds of When the Levee Breaks would be sufficient for a fingerprint? If the system is always listening, it always get the beginning, so it needs that, no more. Soundhound needs more since it gets called at any point in a track.

                  Somehow, though I wonder - if the music is playing on my device, I loaded it on there. The metadata must be somewhere, though if not, then I got an unlabeled track. Really? That's possible, but the value proposition escapes me.

                  • If the system is always listening, it always get the beginning, so it needs that, no more.

                    That's the one thing I couldn't understand about their whole system. Thank you.

                    • If the system is always listening, it always get the beginning, so it needs that, no more.

                      That's the one thing I couldn't understand about their whole system. Thank you.

                      If you enter the premise (restaurant, bar, etc.) mid-song, then the phone won't have access to the beginning of the song.

                      Assuming Pixel 2 can still identify a song that it only heard from the middle, then the audio fingerprint must cover characteristics of the entire song.

                      Another poster mentioned how a scan of a human fingerprint stores only specific interesting datapoints. So maybe Google's audio fingerprint includes a few bytes representing average BPM, a few bytes representing vocal range, etc. I'm not

                  • by Megol ( 3135005 )

                    Well there are plenty of songs that start with silence... ;P

                  • Which few seconds of When the Levee Breaks would be sufficient for a fingerprint? If the system is always listening, it always get the beginning, so it needs that, no more. Soundhound needs more since it gets called at any point in a track.

                    Somehow, though I wonder - if the music is playing on my device, I loaded it on there. The metadata must be somewhere, though if not, then I got an unlabeled track. Really? That's possible, but the value proposition escapes me.

                    It doesn't get the beginning if you turn on the radio half way through the song.

          • A 3.2MB local database is probably not going to be a big deal for anyone.

            More than two floppy disks full of data? Not a big deal? What has this world come to...

      • Do you honestly think that something which amounts to a checksum takes very much space? Probably a few bytes per song.

        But you would need a ton of checksums per song, since they're only capturing what - 10 seconds of each song. You'd need a checksum for every possible 10 second window of every song.

      • It uses a microphone and analog to digital converter, there is background noise, and they don't have a known start and stop point. The incoming bitstream is not by any stretch of the imagination an invariant, so however it works "that ain't it."
    • Re:works offline? (Score:5, Informative)

      by lucm ( 889690 ) on Thursday October 19, 2017 @09:42PM (#55400967)

      How in the actual fuck is this possible? They have an audio an audio signature of every song built in?

      Yes. And this is not surprising; the data needed to identify songs is tiny. Essentially it's just vectors (big numerical arrays), they don't need to store the whole mp3.

      More and more can be done locally on the devices. For instance, look at what is actually needed to detect English speech using CMU sphinx:
      https://github.com/cmusphinx/p... [github.com]
      (look at the hmm model)

      This used to require huge computing power and storage, but now it can work on a mobile device.

      Another example: once upon a time you needed Google datacenters to do gender and age recognition on photos. Now you can download pre-trained models for that, and the result can fit on a mobile device. Or you can download the entire dataset (500k photos of celebs) and train it yourself on your own servers;
      https://data.vision.ee.ethz.ch... [ee.ethz.ch]

      Or you want a model to recognize basically any kind of object in a photo?
      https://github.com/tensorflow/... [github.com]
      (there's a model specifically designed to run on mobile devices)

      i know it's disturbing but this is where things are today. Just a few years ago, this XKCD comic was true:

      https://xkcd.com/1425/ [xkcd.com]

      Now you can actually download the code and models to do that completely offline and in a few ms.

      • yet, the actual results database is so bad, that shazam and google both fail to identify most of the jazz songs I try with their service.

        now I just record in whatsapp and send to a music friend who reply not only with the correct track but a few similar suggestions.

        screw ai.

        • yet, the actual results database is so bad, that shazam and google both fail to identify most of the jazz songs I try with their service.

          now I just record in whatsapp and send to a music friend who reply not only with the correct track but a few similar suggestions.

          screw ai.

          Pareto principle, amigo. Pop, hip-hop and country will always take the bulk of the market, and therefore the bulk of the attention of this kind of tool.

          But instead of seeing this as a problem, maybe you should see it as an opportunity. You probably have a huge collection of the kind of music you like; you coud build specialized models and sell them. Or even better, create a specialized shazam-like app where people can purchase those additional models, and you'll make good money, then Google will buy you out

        • Jazz is famous for improvisation/jamming and having many, many recordings of any given song. By comparison, a few notes and some rhythm will pretty much nail most commercial pop songs. Different style of music.
        • by Anonymous Coward on Friday October 20, 2017 @05:40AM (#55402111)

          jazz songs ?

          why do you want to stress their app with sending random data ?

          • Rock songs are three chords played in front of thousands of people.

            Jazz songs are thousands of chords played in front of three people.
        • by Hall ( 962 )

          yet, the actual results database is so bad, that shazam and google both fail to identify most of the jazz songs I try with their service.

          Shazam and the like frequently fail to ID classical songs, even the most well-known ones. I presume it's because these services haven't "fingerprinted" every different orchestra's or symphony's version of different songs. Even though the notes are the same, there are no doubt differences that are detected by these fingerprinting mechanisms.

      • It helps if you only listen to the top 40.

        If you're listening to Ayurvedic by Ozric Tentacles, that might not be in the on-device fingerprint database.

      • They aren't analyzing MP3s. If you record a song played through speakers and picked up by a microphone then converted to digital no two MP3s will have the same signature.
    • by Anonymous Coward

      Simple, they just upload a copy of the audio through the DMCA filter they have set up for Youtube and parse out the artist/title from the resulting take-down notification.

      And yes I did read that it is done off-line but that's not as funny.

  • by Anonymous Coward

    "The Pixel 2’s on-device database for Now Playing is based on Google Play Music’s top songs, the Google spokesperson revealed. Google wouldn’t share the exact number of songs in the database, but the spokesperson did note it’s in the high 10s of thousand"

    So it's only top songs, and only a limited number.
    Aren't these things meant to have high numbers of songs so that you can find out what that obscure song is, not just the latest taylor swift one?

    • I'm guessing it tries to phone home when it doesn't find a local match. I doubt VentureBeat tried it with anything obscure, but I highly doubt Google is gonna give a "we don't know" answer without exhausting their resources.
      • Re:small database (Score:4, Informative)

        by TranquilVoid ( 2444228 ) on Thursday October 19, 2017 @11:10PM (#55401287)

        According to the article the local song database is updated once per week based on the changing popularity of songs on Google Play. The least popular songs are replaced rather than expanding the database in perpetuity, and if you never enable the feature the database is never downloaded.

        • That doesn't address how it might identify songs not in the local database. The database holds roughly 10k songs, but songs which are removed from the database don't cease to exist and I'm guessing it will phone home to identify them.
          • Sorry, I just assumed you'd come to the same conclusion as I did, which is that it simply will fail to identify the songs and not phone home to look up a master database. It's a good point, even if my assumption is correct, nothing says they won't change it to do a dynamic lookup in the future without notification.

            • Sorry, I just assumed you'd come to the same conclusion as I did, which is that it simply will fail to identify the songs and not phone home to look up a master database.

              What part of my initial comment made you think I would have come to that conclusion? The first thing I said was literally:

              I'm guessing it tries to phone home when it doesn't find a local match.

    • by Anonymous Coward

      Sure, but this is just "ambient" song recognition. If you specifically want to look up a song, you can still ask the Assistant, or tap the mic button in the google search widget or the Play Music app, and it will send audio to the cloud and get you a quicker & more accurate response from the much larger online database.

    • Aren't these things meant to have high numbers of songs so that you can find out what that obscure song is, not just the latest taylor swift one?

      Uh... no. The point is to ID that song that your friends were listening to and you are hearing it again and you want to know what to call it so you can impress your friends next time it comes on, or better still, sell you a $0.99 copy to download to your phone.

      Obscure songs are... obscure. Use the online database if you want to ID obscure stuff.

  • by nospam007 ( 722110 ) * on Thursday October 19, 2017 @09:31PM (#55400917)

    So it knows every song played in the gym, why can't it be used to _tune out_ that song playing right now with my noise-supressing headphones, since it knows every tone in advance and hasn't to guess?
    Then I could hear _my_ music playing in my headphones.
    Would also be great for the barmen in nightclubs and other places playing loud music.

    • Re:OK (Score:5, Interesting)

      by Dan East ( 318230 ) on Thursday October 19, 2017 @10:28PM (#55401145) Journal

      Although I think you're being funny, no, this couldn't be used in that way. Noise cancelling headphones work by using destructive interference, which requires an exact opposite waveform of the sound being cancelled out. Since the analog waveform of the music would be affected by any number of factors (the quality of the speakers playing it, the equalizer settings of their audio equipment, the bitrate of their source, the echoing of the sound off various objects, multiple speakers playing the audio, which would result in multiple "copies" of the music reaching your ear just very slightly delayed from one another, etc, etc), you couldn't use a "canned" waveform (the original MP3) to cancel out the actual waveform reaching your ears.

      Now, while it might be possible, using AI, to try to do a best match of the ambient sound against a canned waveform, and cancel out only the ambient sound that seems to match, it still would not work perfectly. That would result in echos and certain portions of the frequency spectrum still being heard, which would sound very strange.

      • As you say, it can't use a canned MP3 to generate a noise cancelling waveform... but since the device is listening and can compare the acoustic properties to the reference master, it should be able to make a mapping that could get pretty close.

    • by Megol ( 3135005 )

      Cool idea but probably hard to make work... Probably, there are a lot of smart people.

  • I'll bet Google Pixel 2 won't be able to identify the songs on my playlist.

    https://youtu.be/99KkbFjZR20 [youtu.be]

  • Weasel Words (Score:4, Interesting)

    by PPH ( 736903 ) on Thursday October 19, 2017 @10:32PM (#55401159)

    No audio is ever sent to Google.

    No. But the playlist along with location data probably is. Either in real time or forwarded when the network becomes available.

    And then this is turned over to ASCAP/BMI to verify that commercial establishments you were in have paid their fees.

    • Re: (Score:1, Insightful)

      by Anonymous Coward

      This is google we are talking about. ALL your personal data is immediately supplied to the Democrat party and the NSA/CIA deep state crisis actors who are WAITING to use it to confiscate your guns and emprison you and your family.

  • But then pre-packages the information to be sent surreptiously with your daily or weekly Google report.
    I dunno what people find so useful in assistants, crap like this, plus embedded AI to identify objects when you are taking photos and whatnot, but I see it as overengineering stuff and offering little to no convenience (when not actually making things even harder to do) while using you as testbed for future data collection schemes and whatnot.

    This function in particular can only go both ways: either collec

  • by 93 Escort Wagon ( 326346 ) on Thursday October 19, 2017 @11:26PM (#55401343)

    Google, Samsung, Apple... all their phones can now do pretty much everything their customers need, and are powerful enough where there’s not much practical gain in upgrading. These companies are basically stuck trying to sell us high priced gadgets which in truth are pretty much commodities now.

    This new feature from Google doesn’t seem useful to me at all. But, given the choice between a phone able to do this and a phone which can turn me into an animated, talking poop emoji... I’d take this, thank you very much.

  • So everyone knows there exist several shazam-like apps and services that will identify a song that's playing. I have personally used this service maybe two times within the past four years. The 500MB of storage space wasted on the Now Playing data base is just not worth it..

  • Google Assistant / Now has been doing it for years too. You know, the core of the platform your talking about

  • OK, so maybe it doesn't do it offline. But who cares? I can already ask my Moto G "What song is this?" It listens, and tells me what's playing. I've used it numerous times, and it got it right every time!

  • Why is this even a feature? if I put it on my device, I likely know the artist and album. If it's playing from an online service, they know the album and artist. Or, is having the microphone on all the time the real feature?
    • It's for identifying songs that you hear from random sources (boom boxes, restaurant background music, etc.) that you didn't load and/or isn't coming from your own device.

      And the 'Always On' (intermittent, actually - it runs every 60 seconds) means that you can get the info from your lock screen, and the local database means that it doesn't need need any form of internet connection - no impact on your data plan.

      Not that I fit their use case. I'd leave it switched off (the default).

"Engineering meets art in the parking lot and things explode." -- Garry Peterson, about Survival Research Labs

Working...