Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
AI Google Software Technology

There Is a Racial Divide In Speech-Recognition Systems, Researchers Say (nytimes.com) 155

An anonymous reader quotes a report from TechCrunch: Speech recognition systems from five of the world's biggest tech companies -- Amazon, Apple, Google, IBM and Microsoft -- make far fewer errors with users who are white than with users who are black, according to a study published Monday in the journal Proceedings of the National Academy of Sciences. The systems misidentified words about 19 percent of the time with white people. With black people, mistakes jumped to 35 percent. About 2 percent of audio snippets from white people were considered unreadable by these systems, according to the study, which was conducted by researchers at Stanford University. That rose to 20 percent with black people.

The study, which took an unusually comprehensive approach to measuring bias in speech recognition systems, offers another cautionary sign for A.I. technologies rapidly moving into everyday life. The Stanford study indicated that leading speech recognition systems could be flawed because companies are training the technology on data that is not as diverse as it could be -- learning their task mostly from white people, and relatively few black people. [...] The best performing system, from Microsoft, misidentified about 15 percent of words from white people and 27 percent from black people. Apple's system, the lowest performer, failed 23 percent of the time with whites and 45 percent of the time with black people.

This discussion has been archived. No new comments can be posted.

There Is a Racial Divide In Speech-Recognition Systems, Researchers Say

Comments Filter:
  • Oh (Score:5, Funny)

    by ArchieBunker ( 132337 ) on Monday March 23, 2020 @05:53PM (#59864516)

    This is going to be good...

    • This is going to be quite a show. I got popcorn cooking in the microwave, do you want some??
      • Racism works both ways. When I was in Shanghai last summer, the Xiaomi Mi-AI speaker in my mother-in-law's apartment misunderstood most of what I said.

        I used it as an opportunity to improve my pronunciation. I repeated phrases until I got them right. The speakers even had built-in lessons for people learning Mandarin as a 2nd language.

        Perhaps it is politically incorrect to say so, but people could use Alexa to learn to speak standard American English.

        • Comment removed based on user account deletion
          • Standard American English = how Walter Cronkite would say it. Preferably without some of the newspeak ("impactful", "grow the economy") that has been added by media morons since his time. Plus maybe some tech jargon since The Internet didn't exist in his time.

        • Re: (Score:2, Informative)

          by Anonymous Coward

          None of this is racism. The devices a created to recognise the largest number of people in the area that they are being deployed and/or sold. My blacks in America don't speak English clearly, they tend to use lots of slang, elision and place the emphasis on the wrong syllable.

          If bet if you went to England and tried to speak with your American accent, voice recognition systems there would have difficulty understanding you too. Again, it's not racism and there are too many morons these days who immediately ju

          • There are plenty of American Whites with HORRIBLE accents that would probably confound speech recognition systems. Did anyone pick out some swamp creoles from Louisiana for testing? Or how about rural whites from Georgia or Kentucky?

            • by rho ( 6063 )

              Did anyone pick out some swamp creoles from Louisiana for testing

              Amazon doesn't have enough processing power to decipher cajun. There would be datacenter fires from SoCal to NYC if the coonasses got ahold of an Alexa device.

          • 'at's right, gov-nar. Tell 'em Yanks off! Give 'em whats they got comin' to 'em, I says!
        • Perhaps it is politically incorrect to say so, but people could use Alexa to learn to speak standard American English.

          ~ShanghaiBill

          Were I you, I'd argue it's a wash.

        • People shouldn't use Alexa at all.
  • by layabout ( 1576461 ) on Monday March 23, 2020 @05:59PM (#59864550)
    Full disclosure: I've been using speech recognition since the early 90s and I'm aware of the problems of training. If you want to use speech recognition successfully, speak clearly, don't mumble, use complete sentences and speak using written grammar and syntax. Speaking colloquially or casually is almost a guaranteed loss of 15% on accuracy.
    • by nonBORG ( 5254161 ) on Monday March 23, 2020 @06:10PM (#59864572)
      No you don't understand the system is so smart it actually recognized the speech but also recognized the race of the speaker and then threw up errors just to discriminate. It also can be programmed to be gender biased and homophobic in fact there are sliders to set the level of each of these in a secret room in google HQ (and the rest, Tim Cooke set the homophobic slider all the way down at Apple but sometimes they might bump it up while he is not watching.) Just like on the Simpsons you saw Burns and Smithers in the power control room sometimes.
    • by AmiMoJo ( 196126 )

      I find it incredibly good now. I can dictate addresses with odd road names and Google Maps will get it right. In fact the more I over-think it the worse the result, if I just start talking normally with my normal accent and cadence it works better.

      • This. Little by little, speech recognition has improved to the point that it does a pretty good job of understanding the dialects it is trained on. I see no reason that it cannot do the same with others, as long as it is trained properly. It should be able to detect a dialect and adjust its interpretation algorithm appropriately.

        My anecdotal experience speaking English and French to Google Assistant makes me think it can handle a language context-switch without too much trouble. Why not with dialects of the

        • by AmiMoJo ( 196126 )

          Oh yes, language switching. I can talk to it in English or Japanese and it just works out which one I'm using and responds likewise without any perceptible delay. My Japanese accent is good enough that occasionally people ask if I am Japanese, but even so.

    • It seems plausible that this is just a matter of volume, not about speaking clearly. More white people means more data means better speech recognition.

      What do you want to bet that it's also mostly white people who buy those data collection speakers?
  • https://www.youtube.com/watch?... [youtube.com]

    Perfect demonstration video. Get lazy with your language, don't be surprised when you can't be understood.

  • Race (Score:4, Interesting)

    by markdavis ( 642305 ) on Monday March 23, 2020 @06:10PM (#59864570)

    >"There Is a Racial Divide In Speech-Recognition Systems, Researchers Say"

    Your race doesn't determine how you talk, your annunciation, your grammar, or your word choices.

    • Re:Race (Score:5, Funny)

      by Fringe ( 6096 ) on Monday March 23, 2020 @06:17PM (#59864612)

      Your race doesn't determine how you talk, your annunciation, your grammar, or your word choices.

      Umm... yeah... and if you aren't yanked up to heaven directly, perhaps your annunciation was rejected for lack of enunciation during your prayers.

      • >"Umm... yeah... and if you aren't yanked up to heaven directly, perhaps your annunciation was rejected for lack of enunciation during your prayers."

        LOL!!!

        I misspelled it and the "auto-correction" wasn't what I meant at all. Gotta love those homonyms! This is text, but it is a perfect gaff when discussing voice recognition. :)

        • by nazrhyn ( 906126 )
          Since I didn't start it, but it already got started, I feel obligated to s/gaff/gaffe/. Please don't hurt me.
    • Re:Race (Score:4, Insightful)

      by AmiMoJo ( 196126 ) on Monday March 23, 2020 @06:47PM (#59864694) Homepage Journal

      Race is highly correlated with certain accents and forms of English. While it's true that genetics have little to do with the accent you have race is more of a cultural thing.

      We don't really have a word for cultural bias, it tends to just get put down to race. Which is a shame because Scottish people have a lot of trouble with speech recognition.

      Anyway it would suck if we all sounded the same because that was the only way to make computers understand us.

      • by Khyber ( 864651 )

        "We don't really have a word for cultural bias"

        Ethnicist. The word you REALLY mean when you say 'racist.'

      • Anyway it would suck if we all sounded the same because that was the only way to make computers understand us.

        You can say that again. Language diversity makes things better.

      • Correlation is not causation AND that is not an argument against his point.
      • We already all sound the same on the phone because that's the only way to make tech support customers understand us

      • "We don't really have a word for cultural bias"

        Uh...provincialism? Sort of...

  • accent VR (Score:5, Insightful)

    by phantomfive ( 622387 ) on Monday March 23, 2020 @06:10PM (#59864576) Journal
  • by FuegoFuerte ( 247200 ) on Monday March 23, 2020 @06:11PM (#59864578)

    I want to see the data. I want a room full of natural-born Americans of various races, all raised in the same state (let's say, Washington, as most say we have one of the more "neutral" accents in the US), and see what that does to the statistics.

    We could just as easily, and accurately, say that speech recognition discriminates against those who live in the deep south.

    There are educated people of every race and color who speak clearly and with good diction and grammar, and people of every race and color who speak with a heavy accent or use lazy pronunciation or horrible grammar. If people of one race or another have a higher percentage of people who speak poorly, that is not discrimination by the technology trying to interpret what they're saying.

    FFS, why is this even news? Because we're all sick of coronavirus stories?

    • by Type44Q ( 1233630 ) on Monday March 23, 2020 @06:15PM (#59864600)
      "Divide and Conquer" is never outdated.
    • by sjames ( 1099 )

      Because we're all sick of coronavirus...

      I see what you did there.

    • by Octorian ( 14086 )

      While its not universally true, in a lot of the US you'll find that Blacks speak with a completely different accent than Whites. Yes, even if they're from the same city and state.

      • by eepok ( 545733 )

        Yes, but the way they speak is a result of their cultural/sub-cultural upbringing (nurture), not their race (nature). You can raise a White child in a Black family/neighborhood and a Black child in a White family/neighborhood and each child will take on the the accents and idioms of their environment.

        Thus, race isn't the issue. The issue is that speech-recognition systems are calibrated to recognize the developer's or programmer's accent... which isn't really so much of an "issue" so much as it is a shortco

  • by Fringe ( 6096 ) on Monday March 23, 2020 @06:14PM (#59864590)
    The report claims that the speech recognition systems have trouble with two specific areas:
    1. Acoustics
    2. African American Vernacular English

    I don't even understand the acoustics problem... were they yelling? If so, that sounds like a sample-selection problem. Which brings us to the elephant in the room.

    They tested twice as many blacks as whites, and don't note from where, etc. Perhaps the sampling of the African American Vernacular English isn't close enough to "American English" to be essentially spell-corrected. After all, Ebonics or Old English would also presumably fail recognition.

    The authors racistly did not correlate for professional experience and education. Harking back to Smokey and the Bandit [youtu.be], smarts are regional. I would expect many white folk from certain less-populated areas that are statistically over-represented in birth-toes and under-represented in adult teeth to do even worse on speech recognition, but I'd also expect the black graduates of Ivy League schools to fare exceptionally well on the tests.

    • by AmiMoJo ( 196126 )

      Attributes associated with certain ethnicity, such as overall size and thickness of the neck, do affect the pitch of the voice. Black men tend to have slightly lower registers on average than white men, although of course it's a bell curve and highly variable. Anyway, different frequencies mean different acoustics and of course the microphones are tuned and post-processed to isolate speech primarily by frequency range too. So for example you might find that some people at either end of the curve have their

    • by mccalli ( 323026 )
      I understand the acoustics issue. I'm white northern English, though have been living in the south of England for a while now. There are some words I say that I can't get Siri to recognise without changing my vowel pronunciations. For example the letter "u" tends to be be deeper, which gives words like "London" a different sound in the north than that in the south, even when the grammar and diction is otherwise identical. It sounds 'muddier', and I'm aware of that and have to force correct.

      I can well bel
  • by phantomfive ( 622387 ) on Monday March 23, 2020 @06:15PM (#59864602) Journal
    It says the systems misidentified words about 19 percent of the time with white people. Is that really true? That means it's misunderstanding every other sentence, which doesn't seem to match my own experience, or the experience I've observed when other people use it.
    • Re:15% of the time (Score:4, Interesting)

      by rogerz ( 78608 ) <rogerNO@SPAM3playmedia.com> on Monday March 23, 2020 @07:11PM (#59864764)

      Are you using a dictation program, where every word matters? Or, are your basing your impression of accuracy on a voice assistant like Siri or Alexa? In the latter case, you could say "Play me some tunes by the Allman Brothers Band", and the system could recognize "Play the Allmans", and you wouldn't know the difference, even though the word error rate was 77%.

      I have been doing ASR R&D for about 30 years, and 15%, on average, for the speaker-independent error rate on real world (not laboratory) tasks is close enough to state of the art. There is of course a huge variation, based on speaking style, topic, noise conditions, microphone transfer function, etc. I would estimate the cross-speaker variance at about 10% (so 90% of speakers will experience somewhere between 5% and 25% word error rate). That a particular sub-population is out on the high end of that distribution is not surprising at all, particularly if you understand the weaknesses (and, yes, biases) of the model-building (both acoustic and language) algorithms.

      Speaker adaptation to the rescue!

    • Yeah, it does not make any sense.

      Most speech recognition I saw people using has not even 1% failure rate. I only used it rarely years ago on my Mac, but there the commands where "pre sampled" aka I said it ten times, connected it to the action and afterwards it was basically 100% correct. I never used speech to text, as typing is about 5 times faster.

    • I don't have a strong accent, nobody ever has trouble understanding me in person, but I don't bother with speech to text on my phone because I have to go back and correct at least one word per sentence. So that seems about right.
  • The Scots are black now?

    • by pjt33 ( 739471 )

      By the definition of "black" published by my university's student union back when I was an undergrad, yes. It's the history of oppression by the English that qualifies them...

  • Aaron Earned an Iron Urn: https://www.youtube.com/watch?... [youtube.com]

    Or ELEVEN!! : https://www.youtube.com/watch?... [youtube.com]

  • Fascinating but given past results with facial recognition, I can't say I'm surprised.

    My next question is, does this happen with other languages? Presumably this is because English-speaking engineers trained their models with predominantly white English samples. I wonder if the Spanish models favored Spanish, Mexican, Basque, or some other dialect or accent?

    Moving a bit afield, as an American and native English speaker, I definitely think I can mostly tell when someone is speaking with a (oh God this is goi

    • by pjt33 ( 739471 )

      Basque is not a dialect of Spanish. It's not even in the Indo-European language family.

      • Basque is not a dialect of Spanish. It's not even in the Indo-European language family.

        My bad. I knew I was going to get that wrong. What I had in mind was the Basques probably speak Spanish with a distinct accent. I'm thinking of how the Welsh have a distinct accent and their own language.

        • by pjt33 ( 739471 )

          I don't know many Basques, and maybe my ear isn't well enough tuned, but I've not noticed that they have a distinct accent. FWIW the main accent divide in Spain is north (with the lisp) vs south (predominantly without the lisp, although there are some regions which "invert" the lisp in a language variant called "ceceo"). A strong Andaluz accent also loses the ends of many words, a phenomenon which is also common in Cuba.

    • >"I definitely think I can mostly tell when someone is speaking with a (oh God this is going to sound racist) black accent."

      What you said is not "racist" at all. Of course, the meaning of the word "racist" and "racism" has been skewed all to hell to mean just about ANYTHING now.

      Detecting an accent is just an observation; and one no human can prevent. And there is no one "black" accent any more than there is one "white" one, or "latin" one, or "asian" one, or regional one. Passing on the observation as

    • by PPH ( 736903 )

      does this happen with other languages?

      It certainly does. While watching Das Leben der Anderen [wikipedia.org] with the director's comment track turned on, at one point he remarked that one of the Stasi characters spoke with a particular regional accent. The equivalent in the USA being a hillbilly. Unfortunately, the resulting humor would be lost on non-German speakers.

      And the there's the British Cockney accent. Not something I can reliably identify, but in a group of Brits, it does stand out to me. Extra credit if you know the proper definition of a Cockney w

      • does this happen with other languages?

        It certainly does. While watching Das Leben der Anderen [wikipedia.org] with the director's comment track turned on, at one point he remarked that one of the Stasi characters spoke with a particular regional accent.

        For sure. I remember in German class, our teacher mentioning that one character in a video had a very broad Bavarian accent.

        What I meant to ask is whether speech recognition systems have trouble with some accents but not others in non-English languages. For that matter, I wonder if English speech recognition systems can recognize Boston, New Jersey, bog-standard American, southern drawl, Texas twang, midwest, BBC, Cockney, Liverpool, Aussie, Kiwi, and the other English accents equally well. Sounds like a da

    • It is not racist.

      For me it is the same, and I'm not even a native english speaker. Black women/men singing or talking simply sound "black" no idea why. I sometimes make a mistake in that regard, but that is super rare. It is most certainly not an accent, it is something anatomic I guess.

    • > My next question is, does this happen with other languages?

      Yes. I live in Canada. France lost Quebec in the "French and Indian War" which ended in 1763 https://en.wikipedia.org/wiki/... [wikipedia.org] Since then "Quebecois" French has dveloped a very different accent and spoken style from "Parisien" (i.e. European French). The grammar is identical, and emails are not a problem, but a Quebecois speaker and a Parisien speaker can have difficulty understanding each other in a conversation, if their accents are extreme.

  • I want to hear more about how Coronavirus is utterly destroying retail while Amazon hires hundreds of thousands of new human robots.

  • by skogs ( 628589 ) on Monday March 23, 2020 @06:58PM (#59864726) Journal

    I had a perfectly wonderful experience with my phone and voice recognition. Then I spent some quality time in a middle eastern country. The damned thing started taking location into account, and I'd search or try to transcribe simple things and get exactly zero correct words. It continually tried to do 'local' search items in arabic and these things just don't work so well.

    Just because you are a boring white guy doesn't mean your voice recognition is going to work.

  • by nut ( 19435 ) on Monday March 23, 2020 @07:17PM (#59864776)

    As a Kiwi, I can tell you that all speech recognition software that I've tried struggles with my accent. I really do try putting on an American accent to get round it sometimes.

    Amazon and some others now offer an Australian accent option, which is usually close enough to solve the problem for me.

    • I had a client one time who was Vietnamese. He had a THICK accent and I had trouble understanding him.

      He was a doctor, and used Dragon Naturally Speaking to dictate notes. This worked great for him, had no trouble with his accent. When setting it up on a new computer, it ran through the calibration, having him read a paragraph a few times, after selecting which region most closely represented him, and after that, it worked beautifully.

  • Hick accent (Score:4, Informative)

    by srwood ( 99488 ) on Monday March 23, 2020 @07:18PM (#59864784)

    It doesn't do very well with my white wife's speech either. Strong East Texas accent.

  • What's racist about this is the blindness to it being a CULTURAL problem. Speak recognition is blind. Literally.

    Asinine reporting.

  • by noobiedoobiedo ( 6194604 ) on Monday March 23, 2020 @07:38PM (#59864858)
    For how long will we continue to dance around the obvious? How many excuses must be paraded out in defense?
    • That the majority of black people think it's fun to talk in a heavy slang with poor grammar and muddled pronunciation? Give the poor computer a break, it needs Standard English.

      • by eepok ( 545733 )

        The computer doesn't need standard English. It needs whatever language and accent it was programmed/trained to recognize.

  • What about brown people? I speak excellent English, even if I say so myself. Most speech recognition systems still struggle with my pronunciations - so I guess I don't speak good American (white). If the paper analyzed just white and black accents, it left the brown ones out.
    • God help the system if they test it on a Guyanese man, their dialect is unintelligible to me.

    • Pronouncing the words like it says in the dictionary is part of speaking excellent English (or whatever language.) Lots of "white" people don't speak English very well either, including many if not most of the actual English, so I'm not picking on you for being "brown".

      It's not white or even American English, since even the unaccented newscaster lack-of-accent is actually an accent which comes here from England. It's just English. You may have a great vocabulary and impeccable grammar, but if you can't pron

  • User: "Alexa, I'd like to ax you a question."

    Alexa: "OK, I've found some nearby places that sell axes..."

    I guess I'll burn in hell a little for that one, but I just couldn't resist.

  • Irish ones in particular.

    "Periwinkle blue"

  • Siri still looks stuff up on Wikipedia and expects me to read it instead of reading it for me. If I wanted to do that I would have already done that, biatch.

  • by Tom ( 822 ) on Tuesday March 24, 2020 @12:52AM (#59865684) Homepage Journal

    Really? Is it race?

    No, it turns out it isn't. It's lazy editing.

    The researchers took two datasets of voice samples. One spoken in "African American Vernacular English (AAVE)" the second one being "Voices of California (VOC)".

    tl;dr: Turns out that ghetto slang is more difficult to understand than regular English. Say whaaaat? I'm a non-native speaker with enough English that I've been mistaken for native speakers - and I find it quite a lot more difficult than the language recognition did.

    Every black person I've ever met in person that doesn't speak ghetto was very easy to understand, and frankly speaking I never even noticed any difference. And I doubt highly that there IS a difference. But yeah, if ya grew up in da hood it probly is a bat daffacult ta understand ya, yo!

  • by phoenix321 ( 734987 ) on Tuesday March 24, 2020 @03:28AM (#59865880)

    of measurements, in many areas, fields or places, maybe there is a reason why that is not "discrimination".

    Human biodiversity is the most hated subject on Earth. Evolution requires species to exist in different populations and groups, and it requires those to be different. Evolution could not have happened without it.

    People who claim they follow science in every way suddenly argue for Creationism when they claim humans are all exactly identical from birth and in every aspect. There is no middle ground, either: (people are born with the same capacities AND there are no group differences AND evolution does not apply to humans) XOR (human populations with genetic differences exist).

    Eat your heart out. I am looking forward to people arguing that humans have evolved like all other species BUT humans are all identical and have no discernible differences in localized populations.

    • by Twinbee ( 767046 )
      Whilst I agree that racial differences exist (including bone density, sporting ability, proneness to disease and even vision [archive.is], I have a hard time believing accent and dialect, or even the voicebox itself, is affected significantly by differences in DNA.
  • ?There is an assumption that computers are somehow neutral arbitrators of facts and tat biases do not exist in their decisions; when in fact their are inherent biases since humans did the programming that delivers the outputs. That doesn't mean the programmers intentionally introduced biases; just tthat they did not recognize their biases when they created the algorithms used by the computer.

    The danger is relying on the computer's outputs on the assumption it is unbiased. Companies tout computer based asse

  • The real problem is with the English language itself. Having stolen/acquired/bastardized words and structure from other languages and with a grammar and spelling structure that is, at the very best broken, the reasonable chance of AI comprehending speech at 100% is slim to none.

    Example: "Eats shoots and leaves" could have multiple meanings depending on whether there is comma "Eats, shoots and leaves" or not.

  • If your test data has proportional samples from lots of groups, those who are minorities will have less data. Those minorities will test worse. They will test even worse, the more they skew from the majority. See how well Chinese, or Scots or Mexicans or Cajuns do on these tests. The smaller the minority, the worse it will do. They also treat colored as a block, as if they all speak the same. NYC blacks are different from Miami and LA blacks. That is the racist part, knowing Scots are different from Alabam
  • Comment removed based on user account deletion

To stay youthful, stay useful.

Working...