How Google's Pixel 2 'Now Playing' Song Identification Works (venturebeat.com) 129
An anonymous reader shares a report from VentureBeat, written by Emil Protalinski: The most interesting Google Pixel 2 and Pixel 2 XL feature, to me, is Now Playing. If you've ever used Shazam or SoundHound, you probably understand the basics: The app uses your device's microphone to capture an audio sample and creates an acoustic fingerprint to compare against a central song database. If a match is found, information such as the song title and artist are sent back to the user. Now Playing achieves this with two important differentiators. First, Now Playing detects songs automatically without you explicitly asking -- the feature works when your phone is locked and the information is displayed on the Pixel 2's lock screen (you'll eventually be able to ask Google Assistant what's currently playing, but not yet). Secondly, it's an on-device and local feature: Now Playing functions completely offline (we tested this, and indeed it works with mobile data and Wi-Fi turned off). No audio is ever sent to Google.
Re:How big is an "acoustic fingerprint"? (Score:5, Informative)
Yet another lump of unremovable pre-installed stuff taking precious space on your phone.
If you don't turn it on, it doesn't ever download the fingerprint database.
works offline? (Score:3, Insightful)
How in the actual fuck is this possible? They have an audio an audio signature of every song built in?
Re: (Score:1, Informative)
Sure, why not? Do you honestly think that something which amounts to a checksum takes very much space? Probably a few bytes per song.
If we figure 32 bytes per song times 50,000 songs, that's only like 1.6MB of space needed.
Re: (Score:2)
Did you account for the titles of the songs? Names of the albums?
Re: (Score:1)
Names of albums is unimportant as the same song can be featured on multiple albums. People only care about artist and song and text compresses extremely well so, to be generous, go ahead and double that value to accommodate.
A 3.2MB local database is probably not going to be a big deal for anyone.
Re: (Score:1)
As a follow up to this, I just conducted a quick test by piping a full listing of my music library into a text file.
That's 32,078 tracks listing the FULL PATH which includes the drive letter, "Music" directory, artist name, album year, album name, disc number, track number, artist name (again), track name and MP3 file extension. A single sample looks like this:
D:\Music\65daysofstatic\[2016] No Man's Sky - Music For An Infinite Universe\Disc 01\01 - 65daysofstatic [Monolith].mp3
The size of the raw text file is 3.26MB, compressed as ZIP it's 515KB and compressed as 7z it's 392KB.
Re: works offline? (Score:1)
Further, mist of the data in a music library consists of the song audio itself. You haven't accounted for the system's ability to identify a song based on a fragment of its content in spite of the lossy audio channel. It's got to be substantially more than just a hash and a title being stored.
If I would venture my own guess, and let me stress that this is a guess, I would speculate that perhaps Google has devised some sort of fractal audio fingerprinting and song indexing scheme.
Re: (Score:2)
Why is that? A fingerprint contains a huge amount of data, yet a fingerprint reader will condense it down into a small number of points and represent the entire thing as a digit or string of digits. Why can't a computer do the same with music, creating what is essentially a hash and matching that to a database of titles?
Re: (Score:3)
Which few seconds of When the Levee Breaks would be sufficient for a fingerprint? If the system is always listening, it always get the beginning, so it needs that, no more. Soundhound needs more since it gets called at any point in a track.
Somehow, though I wonder - if the music is playing on my device, I loaded it on there. The metadata must be somewhere, though if not, then I got an unlabeled track. Really? That's possible, but the value proposition escapes me.
Re: (Score:2)
That's the one thing I couldn't understand about their whole system. Thank you.
Re: (Score:2)
That's the one thing I couldn't understand about their whole system. Thank you.
If you enter the premise (restaurant, bar, etc.) mid-song, then the phone won't have access to the beginning of the song.
Assuming Pixel 2 can still identify a song that it only heard from the middle, then the audio fingerprint must cover characteristics of the entire song.
Another poster mentioned how a scan of a human fingerprint stores only specific interesting datapoints. So maybe Google's audio fingerprint includes a few bytes representing average BPM, a few bytes representing vocal range, etc. I'm not
Re: (Score:2)
Well there are plenty of songs that start with silence... ;P
Re: (Score:2)
Which few seconds of When the Levee Breaks would be sufficient for a fingerprint? If the system is always listening, it always get the beginning, so it needs that, no more. Soundhound needs more since it gets called at any point in a track.
Somehow, though I wonder - if the music is playing on my device, I loaded it on there. The metadata must be somewhere, though if not, then I got an unlabeled track. Really? That's possible, but the value proposition escapes me.
It doesn't get the beginning if you turn on the radio half way through the song.
Re: works offline? (Score:5, Interesting)
32 thousand CDs, using slim jewel cases at 5mm thickness, means you have a CD tower 160 metres tall. Given a standard height of three metres per floor, your CD stack is over 53 stories high.
Re: (Score:2)
He said he has as many CDs as the poster above has of songs (32k).
Re: (Score:3)
And at 16 grams per CD, your collection weights 512 kilograms, or over half a metric ton without the slim jewel cases.
With the slim jewel cases, with a weight of around 43 grams each, your collection weights a total of 1.888 metric ton.
The E.P.A.'s weight statistics show that the average weight of a 2003 car or light-duty truck, like a pickup, sport utility, van or minivan, was heavier than in any model year since 1976, when the average peaked at 4,079 pounds (1850 kg).
Congratulations on your CD collection
Re: (Score:2)
His library is certainly not small by any means. Your library is completely out of the ordinary though.
Re: (Score:1)
A 3.2MB local database is probably not going to be a big deal for anyone.
More than two floppy disks full of data? Not a big deal? What has this world come to...
Re: (Score:2)
I remember King's Quest... something... had around 14 floppies.
Re: (Score:1)
Do you honestly think that something which amounts to a checksum takes very much space? Probably a few bytes per song.
But you would need a ton of checksums per song, since they're only capturing what - 10 seconds of each song. You'd need a checksum for every possible 10 second window of every song.
Re: works offline? (Score:2, Insightful)
Re: works offline? (Score:4, Insightful)
Checksum is shorthand terminology - music fingerprinting [wikipedia.org] is probably more accurate, but everyone knew what they meant.
Re: works offline? (Score:1)
Re: (Score:2)
Did they call it a checksum or did they say it "amounts to a checksum"? The latter is actually true.
Re: works offline? (Score:1)
Re: (Score:2)
You evidently don't understand metaphor, figurative language, and idioms.
Re: works offline? (Score:1)
Re:works offline? (Score:5, Informative)
How in the actual fuck is this possible? They have an audio an audio signature of every song built in?
Yes. And this is not surprising; the data needed to identify songs is tiny. Essentially it's just vectors (big numerical arrays), they don't need to store the whole mp3.
More and more can be done locally on the devices. For instance, look at what is actually needed to detect English speech using CMU sphinx:
https://github.com/cmusphinx/p... [github.com]
(look at the hmm model)
This used to require huge computing power and storage, but now it can work on a mobile device.
Another example: once upon a time you needed Google datacenters to do gender and age recognition on photos. Now you can download pre-trained models for that, and the result can fit on a mobile device. Or you can download the entire dataset (500k photos of celebs) and train it yourself on your own servers;
https://data.vision.ee.ethz.ch... [ee.ethz.ch]
Or you want a model to recognize basically any kind of object in a photo?
https://github.com/tensorflow/... [github.com]
(there's a model specifically designed to run on mobile devices)
i know it's disturbing but this is where things are today. Just a few years ago, this XKCD comic was true:
https://xkcd.com/1425/ [xkcd.com]
Now you can actually download the code and models to do that completely offline and in a few ms.
Re: works offline? (Score:2)
yet, the actual results database is so bad, that shazam and google both fail to identify most of the jazz songs I try with their service.
now I just record in whatsapp and send to a music friend who reply not only with the correct track but a few similar suggestions.
screw ai.
lemon, meet lemonade (Score:2)
yet, the actual results database is so bad, that shazam and google both fail to identify most of the jazz songs I try with their service.
now I just record in whatsapp and send to a music friend who reply not only with the correct track but a few similar suggestions.
screw ai.
Pareto principle, amigo. Pop, hip-hop and country will always take the bulk of the market, and therefore the bulk of the attention of this kind of tool.
But instead of seeing this as a problem, maybe you should see it as an opportunity. You probably have a huge collection of the kind of music you like; you coud build specialized models and sell them. Or even better, create a specialized shazam-like app where people can purchase those additional models, and you'll make good money, then Google will buy you out
Re: (Score:2)
Re: works offline? (Score:4, Funny)
jazz songs ?
why do you want to stress their app with sending random data ?
Re: (Score:2)
Jazz songs are thousands of chords played in front of three people.
Re: (Score:2)
yet, the actual results database is so bad, that shazam and google both fail to identify most of the jazz songs I try with their service.
Shazam and the like frequently fail to ID classical songs, even the most well-known ones. I presume it's because these services haven't "fingerprinted" every different orchestra's or symphony's version of different songs. Even though the notes are the same, there are no doubt differences that are detected by these fingerprinting mechanisms.
Re: (Score:2)
It helps if you only listen to the top 40.
If you're listening to Ayurvedic by Ozric Tentacles, that might not be in the on-device fingerprint database.
Re:works offline? (Score:5, Insightful)
Why shit on mp3 and try to re-invent the wheel with vectors?
First, nobody is shittng on mp3. As for the reason to use tiny vectors instead of storing big mp3 files, I'm not sure why I have to explain it to you but it comes down to two things.
1) Storage
2) Availability of advanced, high quality vector processing libraries like BLAS or LAPACK
this being said, it was just my guess, for all I know maybe they are storing data in sqlite3 or in the headers of a jpeg file that shows your mom pleasuring herself with a maglite.
Must be new here (Score:3)
What a shame that MP3s are so huge and complex that Google has a storage and indexing capacity problem
What a shame that you didn't RTFA and missed the part where they explain that it works while the device is not connected to internet
Re: (Score:2)
Yeah, that would also be copyright infringement. There's lots of legal reasons why you want to distance yourself more than that.
You don't need to store the actual tones/frequencies - just the relative intensity across a few data points.
Re: works offline? (Score:1)
Re: (Score:1)
Simple, they just upload a copy of the audio through the DMCA filter they have set up for Youtube and parse out the artist/title from the resulting take-down notification.
And yes I did read that it is done off-line but that's not as funny.
small database (Score:1)
"The Pixel 2’s on-device database for Now Playing is based on Google Play Music’s top songs, the Google spokesperson revealed. Google wouldn’t share the exact number of songs in the database, but the spokesperson did note it’s in the high 10s of thousand"
So it's only top songs, and only a limited number.
Aren't these things meant to have high numbers of songs so that you can find out what that obscure song is, not just the latest taylor swift one?
Re: (Score:2)
Re:small database (Score:4, Informative)
According to the article the local song database is updated once per week based on the changing popularity of songs on Google Play. The least popular songs are replaced rather than expanding the database in perpetuity, and if you never enable the feature the database is never downloaded.
Re: small database (Score:2)
Re: (Score:2)
Sorry, I just assumed you'd come to the same conclusion as I did, which is that it simply will fail to identify the songs and not phone home to look up a master database. It's a good point, even if my assumption is correct, nothing says they won't change it to do a dynamic lookup in the future without notification.
Re: (Score:2)
Sorry, I just assumed you'd come to the same conclusion as I did, which is that it simply will fail to identify the songs and not phone home to look up a master database.
What part of my initial comment made you think I would have come to that conclusion? The first thing I said was literally:
I'm guessing it tries to phone home when it doesn't find a local match.
Re: (Score:1)
Sure, but this is just "ambient" song recognition. If you specifically want to look up a song, you can still ask the Assistant, or tap the mic button in the google search widget or the Play Music app, and it will send audio to the cloud and get you a quicker & more accurate response from the much larger online database.
Re: (Score:3)
Aren't these things meant to have high numbers of songs so that you can find out what that obscure song is, not just the latest taylor swift one?
Uh... no. The point is to ID that song that your friends were listening to and you are hearing it again and you want to know what to call it so you can impress your friends next time it comes on, or better still, sell you a $0.99 copy to download to your phone.
Obscure songs are... obscure. Use the online database if you want to ID obscure stuff.
OK (Score:3)
So it knows every song played in the gym, why can't it be used to _tune out_ that song playing right now with my noise-supressing headphones, since it knows every tone in advance and hasn't to guess?
Then I could hear _my_ music playing in my headphones.
Would also be great for the barmen in nightclubs and other places playing loud music.
Re:OK (Score:5, Interesting)
Although I think you're being funny, no, this couldn't be used in that way. Noise cancelling headphones work by using destructive interference, which requires an exact opposite waveform of the sound being cancelled out. Since the analog waveform of the music would be affected by any number of factors (the quality of the speakers playing it, the equalizer settings of their audio equipment, the bitrate of their source, the echoing of the sound off various objects, multiple speakers playing the audio, which would result in multiple "copies" of the music reaching your ear just very slightly delayed from one another, etc, etc), you couldn't use a "canned" waveform (the original MP3) to cancel out the actual waveform reaching your ears.
Now, while it might be possible, using AI, to try to do a best match of the ambient sound against a canned waveform, and cancel out only the ambient sound that seems to match, it still would not work perfectly. That would result in echos and certain portions of the frequency spectrum still being heard, which would sound very strange.
Could still work (Score:2)
As you say, it can't use a canned MP3 to generate a noise cancelling waveform... but since the device is listening and can compare the acoustic properties to the reference master, it should be able to make a mapping that could get pretty close.
Re: (Score:2)
Streaming a song with the phase reversed is still streaming a song, and the maker would have to pay royalties.
Re: (Score:2)
Cool idea but probably hard to make work... Probably, there are a lot of smart people.
Bite it, you scum (Score:2)
I'll bet Google Pixel 2 won't be able to identify the songs on my playlist.
https://youtu.be/99KkbFjZR20 [youtu.be]
Re: (Score:2)
Google Pixel 2 doesn't know about GG Allin. I'm willing to bet on it.
Re: (Score:2)
Pfft, any artist that even has a Wikipedia entry, let alone one with as many words on it as GG Allin's, can hardly be described as obscure. You are so overground.
Re: (Score:2)
Not the live bootlegs.
Re: (Score:1)
Re: (Score:1)
You'd lose the bet; no history is kept - not even locally.
Re: (Score:1)
Why? Because you don't trust Google not to phone home or because you naively assume that it's not possible?
We were doing accurate voice recognition on Pentium 1 PCs with 32MB RAM more than 20 years ago. Why do you think that a vastly more powerful system, such as modern smartphone, can't handle this?
Weasel Words (Score:4, Interesting)
No audio is ever sent to Google.
No. But the playlist along with location data probably is. Either in real time or forwarded when the network becomes available.
And then this is turned over to ASCAP/BMI to verify that commercial establishments you were in have paid their fees.
Re: (Score:1, Insightful)
This is google we are talking about. ALL your personal data is immediately supplied to the Democrat party and the NSA/CIA deep state crisis actors who are WAITING to use it to confiscate your guns and emprison you and your family.
It works offline... (Score:2)
But then pre-packages the information to be sent surreptiously with your daily or weekly Google report.
I dunno what people find so useful in assistants, crap like this, plus embedded AI to identify objects when you are taking photos and whatnot, but I see it as overengineering stuff and offering little to no convenience (when not actually making things even harder to do) while using you as testbed for future data collection schemes and whatnot.
This function in particular can only go both ways: either collec
They’re stuck (Score:3)
Google, Samsung, Apple... all their phones can now do pretty much everything their customers need, and are powerful enough where there’s not much practical gain in upgrading. These companies are basically stuck trying to sell us high priced gadgets which in truth are pretty much commodities now.
This new feature from Google doesn’t seem useful to me at all. But, given the choice between a phone able to do this and a phone which can turn me into an animated, talking poop emoji... I’d take this, thank you very much.
Sounds like an answer to a question nobody asked (Score:2)
So everyone knows there exist several shazam-like apps and services that will identify a song that's playing. I have personally used this service maybe two times within the past four years. The 500MB of storage space wasted on the Now Playing data base is just not worth it..
If you've ever used Shazam or SoundHound, you prob (Score:1)
Google Assistant / Now has been doing it for years too. You know, the core of the platform your talking about
Google voice search has done this for years! (Score:2)
OK, so maybe it doesn't do it offline. But who cares? I can already ask my Moto G "What song is this?" It listens, and tells me what's playing. I've used it numerous times, and it got it right every time!
Curious (Score:1)
Re: (Score:1)
It's for identifying songs that you hear from random sources (boom boxes, restaurant background music, etc.) that you didn't load and/or isn't coming from your own device.
And the 'Always On' (intermittent, actually - it runs every 60 seconds) means that you can get the info from your lock screen, and the local database means that it doesn't need need any form of internet connection - no impact on your data plan.
Not that I fit their use case. I'd leave it switched off (the default).
Re: (Score:2)
Who really cares about how this is done. I'm much more interested in what the battery impact of such a useless feature is. Seriously, how often do most people use this feature, such that it would be useful having this run 24/7/365?
Just don't turn it on if you care that much. It isn't on by default.
Re: (Score:1)
There are enough real reasons to hate Google, that one doesn't need to fabricate false ones. Those of us that don't like Google have even more motivation to call out bullshit complaints than fanboys, because fake problems desensitize people to real problems, often then thinking all complains come from opposing fanboys.
Re: (Score:2)
I'm much more interested in what the battery impact of such a useless feature is.
It's unlikely to move the needle.
Songs usually last for a solid 2-3 minutes, so that limits the processing to maybe 500 analysis per day. That's basically like doing 500 searches per day, on fast storage (probably cached), probably using some kind of vsm or inverted index, or maybe a radix tree. Minimal cpu usage, minimal i/o, no gpu usage. I suspect the weather apps is more harmful for the battery than this kind of thing because it involves the network stack.
phail (Score:2)
*facepalm*
You act as though everyone is only going to listen to full songs of exactly 2-3 minutes when in reality people will be listening to song fragments and jumping from track to track at any time.
Before facepalming people, maybe you should RTFA. The service they describe is not for someone actively listening to music, it's to detect automatically whatever is playing in the background. For instance, when you're at Starbucks.
I'm not a fan of Google usually but this is a pretty interesting feature. Shazam is cool but most of the time when I hear a song that I like, my interest doesn't go as far as actually unlocking the phone and starting the app. But glancing at the lockscreen and seeing what is playi
Re: (Score:2)
Looks like you're illiterate then.
If someone has music playing in the background which is being changed at random intervals, whether because the song ended and the next began or because someone switched tracks, the app cannot know that unless it's continuously monitoring.
Nobody said it wasn't continuously monitoring. That's not the same as continuously recognizing.
I can see why you're confused; you have no idea how technology works. You must shit your pants when you see a dvr skipping commercials (OMG THEY HAVE THE ENTIRE CORPUS OF TV ADS IN THAT BOX AND IT'S CONSTANTLY COMPARING THEM TO MY TV SHOW OMG OMG").
Re: (Score:2)
If so there have to be a continuous monitoring of song changes in order to trigger the recognition algorithm - and I'd call that continuous recognition.
This is basic logic.
Re: (Score:2)
If so there have to be a continuous monitoring of song changes in order to trigger the recognition algorithm - and I'd call that continuous recognition.
This is basic logic.
This is, in fact, very "basic" logic. The same kind of logic that can make anything be anything else, since it's anything.
Re: (Score:2)
I suppose this feature might be of intense interest to some, of casual interest to others, but many, such as myself, ask, "This is on my phone why?
Making phone calls on a phone has long ago become a secondary, even a tertiary, feature.
Re: (Score:3)
"This is on my phone why?
"
Because you installed it by mistake and are too silly to remove it?
Because you needed something to complain about?
Can you tell us? Why did you put it on your phone?
Re: (Score:2)
It isn't optionally put on the phone, it is put there by Google and requires rooting the phone to get rid of it.
And to the person who said I should stop complaining about smartphones, I was making an observation. There's a difference. If it came across as a complaint, then I could have expressed it better.
quick search tells me something else... (Score:2)
It’s worth noting that Now Playing is turned off by default. You have to explicitly turn it on in the setup flow when first starting your Pixel 2 or Pixel 2 XL, or in Settings (as shown above).
Re: (Score:2)