Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Google Privacy Your Rights Online

Google Wants Your Voice Data 138

00_NOP writes "Peter Norvig, Google's director of research, has told New Scientist that one of the reasons the search engine launched Google Voice is that it needs more human voice data to perfect the sort of 'big data, simple algorithm' probabilistic approach to translating voices to text that drives Google Translate. Norvig says that no one is listening to your calls on Google Voice — it is simply their servers trying to get the translation right."
This discussion has been archived. No new comments can be posted.

Google Wants Your Voice Data

Comments Filter:
  • by C_Kode ( 102755 ) on Tuesday May 03, 2011 @08:48AM (#36010032) Journal

    I will say that the translation of my voice mails is terrible. Although, how can you tell if it is translated correctly if you don't listen to it? You can look for proper English, but even some of my translations are proper English yet still incorrect. (names, etc come out wrong.) Though most of the time it it's just a jumbled mess that I can't deduce the actual meaning of.

    • by Seumas ( 6865 )

      Your voicemails/transcriptions have a button you can check to mark whether or not it was accurate. Presumably, that is what they mean. Nobody besides *you* listens to them. On the other hand, if that is somehow not the case . . . . then . . . fuck no.

      • by thePig ( 964303 )

        In that case, they can use Youtube videos for this, right? Their automatic translation is quite horrible - they could use the good/bad check there too.
        Actually they can translate the same videos everytime people sees it - and until quite a high percentage of people say yes, they can test it again.
        Also, when they have more than 200 Million videos in youtube, why do they need to store data from Google Voice - which is much more personal and important.

        • If the message doesn't require a double check there's no reason for them to store it as they don't have any way of knowing whether or not it was accurate. However, for ones that they do have to go back and analyze, there's a good reason why they'd want to store them. In a word regressions. Without a body of samples which were tough, they don't have any way of gauging whether or not they're truly making progress as improvements could just as easily be in the quality of the samples that they are trying to tra

          • No"BODY" is listening, but computers are analyzing every call and transcribing it to text?

            Hmm, I would guess there would need to be at least some spot-checks that the transcription is working properly.

            And isn't there some kind of federal wiretapping law preventing this or is it a "well, we told you we were listening in on every call"?

            And methinks it just might be easier for the gov't to get these transcriptions instead of the actual audio recordings. And more convenient to, because it's much faster to read

            • No"BODY" is listening, but computers are analyzing every call and transcribing it to text?

              Hmm, I would guess there would need to be at least some spot-checks that the transcription is working properly.

              You do, when you click the check-mark or the red X after listening to it. However, that is a good point -- here Google says nobody will listen to your voicemail, but they still let you listen to it. So much for don't be evil! (This is sarcasm.)

              As for other people getting ahold of the transcriptions... not to be too facetious, but given how bad the quality currently is, nobody should worry about that yet. ;)

              • Well, wouldn't it be worse if the transcription was wrong and the SWAT team comes to your house because they think you are making a bomb instead of just noting that the movie bombed...

            • Hmm, I would guess there would need to be at least some spot-checks that the transcription is working properly.

              The only time somebody at Google listens (well, is allowed to listen) to your voicemail or recordings is when you click the red X and consent to their review.

      • by yakatz ( 1176317 )

        Your voicemails/transcriptions have a button you can check to mark whether or not it was accurate.

        And once you click that button, you have the option to donate your message so they can use it to improve their software. Example: http://img808.imageshack.us/img808/242/unled90.png [imageshack.us]

    • There is a checkmark and an X you can click if the translation is good/bad.
    • Although, how can you tell if it is translated correctly if you don't listen to it?

      They don't listen in, at least not initially. You do. If it's not translated correctly, there's a box for you to check that gives them permission to listen.

      • They've gotten some pretty good data from me. My Google Voice number isn't currently published, so all of the voicemails I get are wrong numbers (or tests). They are completely incomprehensible to me, and to Google Voice - although one did a fair approximation of jibberish English (I think it was in some African dialect). Most seem to be in African languages, although a few are central European sounding. Good luck getting a good translation - but that's the magic that Google is trying to accomplish: tra

        • by heypete ( 60671 )

          Oh, and I do get a fair number of advertizements and service calls. If you had an appointment with Comcast last Thursday, the tech called the wrong number - that's why he didn't show up. Google did a good job on the translation though...

          I have similar experiences, only with email instead of voicemail.

          This confuses me as I've owned my own domain for just under 12 years (and was the original registrant), and am the only recipient at the entire domain. It's my personal address and a few generic role accounts (postmaster@, abuse@, etc.) that forward to my personal account. There is no reason why someone named "Diane" should use my email address (pete@[my slashdot username].com) when scheduling an Apple Store appointment in South Carolina (not

    • My experience with a Nexus One...

      If I used voice transcribing for to the phone directly: as in I spoke to my phone to do a google search or write a text, it came out fairly well. There was the occasional error but mostly on things like names.

      But my transcripts from the voice mails I receive were often trash.

      I guess it has to do with the sound-quality: it probably uses the original high-quality recording locally so it performs good Google searches. Meanwhile the compression and static over the phone line (

      • I've had the same experience, my voicemail transcripts are garbage.

        When I speak into my phone to write a text or run a search I make a point to speak slowly and enunciate very clearly. I suspect most people don't make the same kind of effort in voicemails.

      • That is absolutely correct. Speech recognition is incredibly sensitive to sound quality and background noise. When you're talking into your phone, the ASR software has a very good quality sample to work with. On the phone line, however, you're dealing with a very noisy signal, which causes huge degradations in the quality of recognition.
    • They have another server that checks the first server's translation. Part of their work is checking that server's affectiveness, too.

    • What's funny is this is the case for everyone but both of my grandmothers... either one of them leave a voicemail, and it transcribes > 95% accurate, better than anyone else (30-60% usually). I guess it works well for old women raised in the U.S. midwest.
      • Maybe they use a land line. I've noticed my friends who use land lines (small sample size) have high accuracy rates relative to average.
  • by rwv ( 1636355 ) on Tuesday May 03, 2011 @08:49AM (#36010036) Homepage Journal
    How do servers assess whether they've got the translation correct without having a human-in-the-loop to listen to the conversation and concurrently read what the server translated? Maybe the data is anonymous by the time it gets to a human, but it seems like humans need to interface with the voice data somehow to validate that the server is translating accurately.
    • Re:Self-checking (Score:5, Informative)

      by msauve ( 701917 ) on Tuesday May 03, 2011 @08:55AM (#36010126)
      "How do servers assess whether they've got the translation correct without having a human-in-the-loop to listen to the conversation and concurrently read what the server translated?"

      If you log into your Google Voice page, and look at a translated message, in the lower right corner there is the question - "Transcript useful?" along with yes/no checkboxes. If you check one, it asks if you want to "donate" that VM to improve the translations, you can answer yes/no/never:

      Want to help Google's automated transcription get better? Donated voicemails will be listened to, manually transcribed, and used to improve our transcribing server's accuracy. They are only used for this purpose.

      • It's too bad they don't let you fix the transcription. Even if they're worried about people trying to poison their data (like people talk about with ReCaptcha), they could at least the user fix his own view of it.

        • Or they could just give it a once-over to make sure it's okay. I mean, someone is already taking the time to listen to the voicemail and re-transcribe it manually. It would take less time to verify that the user-generated transcription is correct.
      • I don't understand why they don't let me correct it myself.. Not only would I be helping them with their algorithm, but then I would have a known good transcript to save in my voicemail for later searching.. (Searching is pretty bad if your looking for a word or phrase that is often mis-interpreted).

        • I don't understand why they don't let me correct it myself.. Not only would I be helping them with their algorithm, but then I would have a known good transcript to save in my voicemail for later searching.. (Searching is pretty bad if your looking for a word or phrase that is often mis-interpreted).

          I suspect the fear of trolls mistranslating is too great to allow anyone to do it.

          Imagine some people "helping" the algorithm by insisting that if someone says, say, "music" it actually "translates" to "we're no strangers to love".

    • You are the human in the loop. If you read the VM from the website you have an option to submit the recording if the translation wasn't helpful.
  • by Anonymous Coward

    >simply their servers trying to the translation right
    >trying to the translation right
    >the translation right

    Nicely done.

  • oh shit! Google accidentally my voice data!
  • by Quantum_Infinity ( 2038086 ) on Tuesday May 03, 2011 @08:55AM (#36010132)
    I can tell that they want the voice data badly. They make it very difficult to delete call and voicemail history. You can't delete more than 10 records at a time and even then they go into trash and keep piling up over there. You can delete the data from trash but again only 10 at a time. There is no option to empty the trash. Their help section says that the history is purged from trash after 30 days automatically but only that it isn't. My call history sits in the trash indefinitely unless I painstakingly delete all history 10 records at a time.
    • Good news - after you go through all the painstaking, tedious work of deleting them ten at a time, they're really gone forever!

      *snort* yeah, I couldn't keep a straight face while typing that. Hopefully you couldn't keep one while reading it, either.

    • Sounds like somebody should write a 'droid app to fix that...
  • This is the price of "free" services.

    • by afex ( 693734 )
      no, we're not - and I will GLADLY pay this price, if not more. If you are a GV user, then you understand how insanely valuable the service is.

      on the flipside, if you're a privacy advocate (which I absolutely get!), then don't sign up.

      the thing that I don't get is people shouting "i told you so" at all the people that use google services - we get it, we already know they want to mine our data - and we WANT to give it to them!

      *disclaimer: i do not use GV due to the fact that I MMS more than a teenage girl
      • We are derisive towards "Hai This is Facebook. Plz give us ur full name, address, cell phone number, age, and eye color so we can give you five Farmville sheep."

        But you bring up the more interesting case, "Awesome service versus abused data". (Shout out to Holland and TomTom for yesterday's example.)

        Or here, Google Translate vs ... a billion hours of juicy phone calls!

        Speech is "Audio" - All we need is a hacker and a Wikileaks Dump!

      • This is the price of "free" services.

        on the flipside, if you're a privacy advocate (which I absolutely get!), then don't sign up.

        And sometimes you pay the price [eff.org] anyway, without your consent, and when the services aren't "free". Given the (lack of) choice of my data and money going to a company that isn't really innovating that much, or to an entity that's ostensibly trying to move the state of the art forward and using data at least partially to this end, I can't see how a privacy advocate would consider GV worse than their current voice service.

        A "false-sense-of-privacy advocate", perhaps, or one who refuses phone or voicemail se

        • I use a typewriter to hand-transcribe my answering machine messages from cassette tape, you insensitive clod!
  • by dargaud ( 518470 ) <slashdot2@SLACKW ... net minus distro> on Tuesday May 03, 2011 @09:16AM (#36010386) Homepage
    I gave up trying to get voice software to work over a decade ago. The reason is that I'm trilingual and use all 3 daily. So the software needs to be able to:
    - understand a lousy accent: there are some words I cannot and will never be able to pronounce 'right'
    - recognize what language is being spoken (having those 3 and only those 3 preset in the options)
    Now I haven't tried Google Voice, but none of the software I've tried or heard about could even remotely do those two basic things.
    • Those two requirements don't exactly strike me as "basic".
      • by Anonymous Coward

        They are basic (in the sense that they are a must) for a tool like google voice's.

        To tell apart different languages and guess when a word is a foreign language word.
        I know three languages. Mother language (spanish) second language (english) and some japanese.
        I can still tell when somebody is speaking different languages that I barely know(german, french, chinese, portuguese, italian).

        To put into letters words that it does not have in its vocabulary and no just try and find the closest match

        To understand dif

    • Considering that outside of Africa only a very small fraction of the population speaks more than two languages let alone fluently, I don't think that it's a basic request.

      • Considering that outside of Africa only a very small fraction of the population speaks more than two languages let alone fluently, I don't think that it's a basic request.

        It strikes me that Europe might disagree with you on that.

        • I don't know, even in Spain where they have a dozen languages, few people speak three of them (or two + English).

          • by jodio ( 569370 )

            I'm Canadian and I speak 3 languages. English, French, and Rubbish. Mostly Rubbish.

        • I could be wrong, but I doubt most Europeans are fluent in more than two languages, and I bet a significant number aren't fluent in multiple languages. The reason I'm singling out Africa there is that in parts it's very common for people to speak not just one or two, but three, four or more languages and to have to learn a new language at marriage so that they can communicate.

          Trust me, Europeans have nothing on that.

          • have to learn a new language at marriage so that they can communicate.

            Married people communicate?

            • Funny, I was just thinking that everyone has to learn a new language to communicate after they get married.

      • The Malaysians I know do. They all speak a local dialect (their first language), plus they speak Mandarin (the regional language taught for normal communication), plus they speak English (the language they learn to conduct business). They can't really co-mingle the applications, either; they don't know many business terms in their native language so English isn't just an option, it's preferred for those uses.

      • by Abreu ( 173023 )

        Va te faire enculer, pendejo

      • by xaxa ( 988988 )

        Considering that outside of Africa only a very small fraction of the population speaks more than two languages let alone fluently, I don't think that it's a basic request.

        40% of EUropeans speak English well enough to have a conversation (not including native speakers). In some areas (Switzerland, Belgium, Luxembourg, places near country borders) it's not unusual to speak an extra language.

        If you're a European child you speak [your version of European], learn English at school because English is useful, and if you like languages you might choose another; in the same way, perhaps, that an American child might choose to learn Spanish.

        I know a little French and a little German -

      • by tdknox ( 138401 )

        My fiancee speaks 6 languages fluently, like a native, and switches between them with an ease that impresses the shit out of me. They are Korean (she is Korean), Tagalog, Mandarin Chinese, English, Japanese and French. The first time she came to America, Immigration didn't want to let her in because her English was so good they didn't believe that she had never been here before.

        So, yeah, there are lots of people that speak multiple languages. Just not, unfortunately, in America.

    • Google Voice is pretty amazing, even at the early stages. It can auto-recognize many languages. It can also do a fair job with bad pronunciation. Google translate is able to understand my Spanish - which is fairly incomprehensible. Of course, my vocabulary is limited to that provided by public high schools in rural North Carolina back in 1982. Good luck getting me to do more than ask for directions to the bath room.

      I'll steal a joke from David Sedaris - a single year of high school Spanish just isn't e

      • I haven't heard the Sedaris bit, but in France you can actually ask for "fire" when you want a "light". "Can I get a light?" -> "Tu as du feu?" (I do get your point though.)

        But even ignoring that you can ask for "fire" in France, auto-translators have to realize that you can't word-for-word translate, but also understand that written language is often different than spoken language. Some have started to pick up on this, but they all still have a ways to go.

    • I used to work with a "trilingual" fella.
      Born in Itally, raised in France, and then lived in the USA for 17+ years.
      He effectively spoke no language.
      Bad Itallian, worse French and jumbled English.
      is there an app for that?

      • by Abreu ( 173023 )

        I have had "conversations" with people coming back to Mexico after living years in the US... poor fellows, their Spanish is incomprehensible and their English sounds like a racist joke.

        Pochos have really developed their own pidgin language :S

      • by xaxa ( 988988 )

        One of my flatmates speaks four languages fluently. I think her English is at least as good as mine was when I was 16, in some cases better (consistently using "whom", "and I" etc correctly).

        I'm learning German; when I get a little better I want to have German-speaking Sundays (Deutsche sprechen Sonntags?... Wrong conjugation of sprechen, probably the wrong word order, oh well, I've not been learning long.).

    • by Ksevio ( 865461 )

      You might have more luck with software that has a training phase before doing recognition. I've had great success with Germans, Chinese and Israelis speaking English through extra training.

      As for the language, I'm not sure about consumer software, but phone systems (and probably Google Voice...details are slim on it) will often run recognizers for several languages and then go with the recognition with the highest confidence.

  • Maybe google voice tried to translate the summary:

    it is simply their servers trying to the translation right.

  • And you're surprised why? All voice apps. do this. Always have, always will, and until it's perfected, and we're a long, long way from perfecting it.

    Steven

  • "it is simply their servers trying to the translation right"
    I think you a word in your sentence.

  • by sglewis100 ( 916818 ) on Tuesday May 03, 2011 @09:29AM (#36010554)

    "Peter Norvig, Google's director of research, has told New Scientist that one of the reasons the search engine launched Google Voice is that it needs more human voice data to perfect the sort of 'big data, simple algorithm' probabilistic approach to translating voices to text that drives Google Translate. Norvig says that no one is listening to your calls on Google Voice — it is simply their servers trying to the translation right."

    I think Google Voice translated the last part of that sentence.

  • by Anonymous Coward

    The translation is off pretty far most of the time for my voicemail. But they do end up to be entertaining. Here are two actual translations from google voice:

    1) Okay, I don't know why it takes for ever, for your voicemail to pick up. But anyway, I was just calling to tell you that we forgot to while. I will and I told Mrs. Smith and this is best but she signed about it, so I'm gonna shout in the car and have it for her after I pick her up, bye. You Get Out virtual slot is not with us. So, wish me luck. I

  • Right. Everyone get on Google Voice with funny unnatural accents, unusual intonation and non-native grammar! Let's skew their data.
    • by Anonymous Coward

      Epic wonderment! I'm get near to this.

    • Let's speak MS at it...

      Dear aunt, let's set so double the killer delete select all.

  • Comment removed based on user account deletion
  • by JSBiff ( 87824 ) on Tuesday May 03, 2011 @09:38AM (#36010678) Journal

    There was a Userfriendly.org strip years ago which pretty much summarizes my experience with voice recognition software for the past 15 years. . .

    I can't find the link to the comic anymore, but basically, one of the guys in the office had been trying to use voice recog software. Some of his coworkers come to his office. He's not there, but on the screen, they wonder about the mysterious message, "Cod Am Pizza Ship".

  • by PPH ( 736903 )

    Y'all kin have mah voce data. Sheeeit! I warn't doin' nutin' wid it anyhows.

  • I'd be willing to let this happen if google then released the derived heuristics as free open source software. I'll share if you share.
  • Google is knot an evil umpire. They our hear 2 us with wheel whirled problems. Please stop bash tag google. All your words belong to us.

  • Google announces Google Voice, noting that it will be archiving and auto-transcribing subscribers phonecalls.

    "But don't worry," Google Voice Product Evangelist Boris Badinov said at the press conference announcing the service's launch. "We promise full interoperability with Google Docs, GMail, Android, and the NSA. Also, the artist who does the daily search engine doodle has promised to come up with a really cool, shiny logo."

    And around the world, geeks sign up in droves, many noting that they didn't even

  • So.. They are trying to teach GALaDOS (Google Artificial Lifeform and Disk Operative System) to speak?
  • This is the same as how they put "Closed Captions" on youtube videos.

    Google has no interest in crowd-sourcing the translation or transcription of speech, they want it all automated.

    Which is why YouTube Closed Captions SUCK!

  • GV has had an opt-in feature to basically donate each GV transcription to Google with an indicator of whether you thought it was a good or bad transcription. Is there any evidence that Google is delving into your voice data without your consent? Were you expecting a GV transcription *without* a machine at least analyzing the voice data that came in and then discarding it?
  • Just grab audio from thousands of dialogs or talks on YouTube and test it out.

  • There will only be one voice operating System. Google wants to get there first.
  • I'm sure Google wants to be able to identify people by their voices... I mean in the digital era, where you have pseudonyms, multiple identities, and where portable (micro)phones are proliferating, it would be a mistake not to take advantage of the opportunity to identify or disambiguate people's identity thanks to their voice signature... I'm going to choose my future phone operating systems very carefully...
  • I love the idea of the feature -- I hate stopping to listen to voice mails. ...but I got the most ludicrous / hilarious translation yesterday. Pure poetry!

    "Hi, My name is The bring the Anderson and I was interested in ordering. I'll call. Sarah, Mrs. Kate, on the Hudson birthday. He really liked all here in the for you. So, anyway, I would see it for next Sunday. I'm not sure where they said of. It's Hey Lady, Thank tonight anyway. If you can just give me a back. My phone number is 972."

    Good to know about

"Out of register space (ugh)" -- vi

Working...