There Is a Racial Divide In Speech-Recognition Systems, Researchers Say (nytimes.com) 155
An anonymous reader quotes a report from TechCrunch: Speech recognition systems from five of the world's biggest tech companies -- Amazon, Apple, Google, IBM and Microsoft -- make far fewer errors with users who are white than with users who are black, according to a study published Monday in the journal Proceedings of the National Academy of Sciences. The systems misidentified words about 19 percent of the time with white people. With black people, mistakes jumped to 35 percent. About 2 percent of audio snippets from white people were considered unreadable by these systems, according to the study, which was conducted by researchers at Stanford University. That rose to 20 percent with black people.
The study, which took an unusually comprehensive approach to measuring bias in speech recognition systems, offers another cautionary sign for A.I. technologies rapidly moving into everyday life. The Stanford study indicated that leading speech recognition systems could be flawed because companies are training the technology on data that is not as diverse as it could be -- learning their task mostly from white people, and relatively few black people. [...] The best performing system, from Microsoft, misidentified about 15 percent of words from white people and 27 percent from black people. Apple's system, the lowest performer, failed 23 percent of the time with whites and 45 percent of the time with black people.
The study, which took an unusually comprehensive approach to measuring bias in speech recognition systems, offers another cautionary sign for A.I. technologies rapidly moving into everyday life. The Stanford study indicated that leading speech recognition systems could be flawed because companies are training the technology on data that is not as diverse as it could be -- learning their task mostly from white people, and relatively few black people. [...] The best performing system, from Microsoft, misidentified about 15 percent of words from white people and 27 percent from black people. Apple's system, the lowest performer, failed 23 percent of the time with whites and 45 percent of the time with black people.
Oh (Score:5, Funny)
This is going to be good...
Re: (Score:3)
Re: (Score:2)
Racism works both ways. When I was in Shanghai last summer, the Xiaomi Mi-AI speaker in my mother-in-law's apartment misunderstood most of what I said.
I used it as an opportunity to improve my pronunciation. I repeated phrases until I got them right. The speakers even had built-in lessons for people learning Mandarin as a 2nd language.
Perhaps it is politically incorrect to say so, but people could use Alexa to learn to speak standard American English.
Re: (Score:3)
Re: (Score:3)
Standard American English = how Walter Cronkite would say it. Preferably without some of the newspeak ("impactful", "grow the economy") that has been added by media morons since his time. Plus maybe some tech jargon since The Internet didn't exist in his time.
Re: (Score:2, Informative)
None of this is racism. The devices a created to recognise the largest number of people in the area that they are being deployed and/or sold. My blacks in America don't speak English clearly, they tend to use lots of slang, elision and place the emphasis on the wrong syllable.
If bet if you went to England and tried to speak with your American accent, voice recognition systems there would have difficulty understanding you too. Again, it's not racism and there are too many morons these days who immediately ju
Re: (Score:2)
There are plenty of American Whites with HORRIBLE accents that would probably confound speech recognition systems. Did anyone pick out some swamp creoles from Louisiana for testing? Or how about rural whites from Georgia or Kentucky?
Re: (Score:3)
Did anyone pick out some swamp creoles from Louisiana for testing
Amazon doesn't have enough processing power to decipher cajun. There would be datacenter fires from SoCal to NYC if the coonasses got ahold of an Alexa device.
Re: (Score:2)
Re: (Score:2)
Perhaps it is politically incorrect to say so, but people could use Alexa to learn to speak standard American English.
~ShanghaiBill
Were I you, I'd argue it's a wash.
Re: (Score:3)
Re: (Score:2)
Quoth the AC -
..."I ax you once and I ax you again", "noo phon who dis" of course it will confuse the AI expecting proper English.
Yup, that was my question - what black users did they test with (both for the study and for development). And if the systems are having issues with Ebonics or slang or just a heavy southern drawl mixed with some mumbling (my relatives in Minnesota have a hard time understanding me at times, which is sad 'cause both my parents are speech pathologists!) there are populations that will give it issues with a variety of skin colors.
Re:When people talk like... (Score:4, Interesting)
Try it with a Scotsman from the northernmost reaches of the highlands you can find, it'll be much higher than 35%.
The voice activated Lift (Score:4, Funny)
Mandatory viewing
https://www.youtube.com/watch?... [youtube.com]
(I think they could have done better by having foreigner activate the lift fine in their language. And maybe have a strong welsh accent activate by using an app that reads numbers. Which of course they cannot download within the lift.)
Re:When people talk like... (Score:4, Insightful)
Yes, you just couldn't wait to virtue signal on people who committed the unforgivable crime of noticing that some groups of people have different language standards than others. What a hero, you've saved us all.
Re: (Score:2)
Re:When people talk like... (Score:5, Insightful)
Except the "troll" has a point. I wouldn't be surprised if a heavy Scottish accent confuses the speech recognition as well. As a lifelong white southerner, I know that it helps the speech to text if I speak a bit more distinctly than I would in casual conversation.
It's not a race thing, it's a matter of accents and dialects being confusing from time to time. Kinda like in "My Cousin Vinny" when the judge asks "What is a ute?".
Re: (Score:2)
Voice to text does not translate the southern accent particularly well... my personal (admittedly small) sample suggests it requires two to three corrections per text.
I suspect it's similar for any regionally thick accent: "I pahked the cah in the yahd", for instance.
Of course, it's not going to trigger the same reaction if it seems to be discriminating against Americans of European descent in Boston or Dallas.
Re: (Score:2)
regionally thick accent: "I pahked the cah in the yahd", for instance.
That is a rhotic accent, as spoken here in Australia, most of the UK, much of Europe and Africa.
Google, Siri etc cope just fine. So I don't see why the American versions would not be programmed to recognise it.
But even as a human, I found some of the black accents in the US (e.g. Chicago) as unintelligible as some rural Irish or Scottish accents.
Re: (Score:2)
That said, you mean non-rhotic accents.
Re: (Score:2)
It's not a race thing, and it's not a matter of accents and dialects being confusing. It's a matter of the data set used to train the voice recognition algorithm representing the average way people pronounce their words. For the U.S., there are more whites, so the speech data used to train voice recognition is going to be closer to how whites talk, rather than blacks (or southerners, or hispanics, or any variety
Re: (Score:2)
Ideally, yes, but we're just getting speech recognition working half decently at all. The world is full of comedy where human speech recognition glitches or even fails entirely because of dialect and accent. It's also full of standup routines where comedians offer to translate English to English, see Jeff Foxworthy. See also Scottish Tweets. There are cases where an extreme dialect completely befuddles others, sometimes even others who speak largely the same dialect but in softer form.
Part of the issue migh
Re: (Score:2)
That's why a Cockney British accent is difficult for Americans to understand.
Really?? I'd have thought that would be one of the easiest regional accents to understand. Don't Americans love Michael Caine?
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
How to get accurate recognition (Score:5, Informative)
Re:How to get accurate recognition (Score:5, Funny)
Re: (Score:2)
I find it incredibly good now. I can dictate addresses with odd road names and Google Maps will get it right. In fact the more I over-think it the worse the result, if I just start talking normally with my normal accent and cadence it works better.
Re: (Score:2)
This. Little by little, speech recognition has improved to the point that it does a pretty good job of understanding the dialects it is trained on. I see no reason that it cannot do the same with others, as long as it is trained properly. It should be able to detect a dialect and adjust its interpretation algorithm appropriately.
My anecdotal experience speaking English and French to Google Assistant makes me think it can handle a language context-switch without too much trouble. Why not with dialects of the
Re: (Score:2)
Oh yes, language switching. I can talk to it in English or Japanese and it just works out which one I'm using and responds likewise without any perceptible delay. My Japanese accent is good enough that occasionally people ask if I am Japanese, but even so.
Re: (Score:2)
What do you want to bet that it's also mostly white people who buy those data collection speakers?
Teach Enunciation (Score:2)
https://www.youtube.com/watch?... [youtube.com]
Perfect demonstration video. Get lazy with your language, don't be surprised when you can't be understood.
From simpler times (Score:4, Informative)
Re:From simpler times (Score:4, Funny)
In the German version of this movie they dub these guys as speaking a really extreme Bavarian dialect. I got a kick out of it because that's where I learned to speak German so it sounded normal to me.
Re: (Score:2)
Extreme dialects... there's none better than the classics.
https://youtu.be/J2ePRj1HaF0?t... [youtu.be]
Race (Score:4, Interesting)
>"There Is a Racial Divide In Speech-Recognition Systems, Researchers Say"
Your race doesn't determine how you talk, your annunciation, your grammar, or your word choices.
Re:Race (Score:5, Funny)
Your race doesn't determine how you talk, your annunciation, your grammar, or your word choices.
Umm... yeah... and if you aren't yanked up to heaven directly, perhaps your annunciation was rejected for lack of enunciation during your prayers.
Re: (Score:2)
>"Umm... yeah... and if you aren't yanked up to heaven directly, perhaps your annunciation was rejected for lack of enunciation during your prayers."
LOL!!!
I misspelled it and the "auto-correction" wasn't what I meant at all. Gotta love those homonyms! This is text, but it is a perfect gaff when discussing voice recognition. :)
Re: (Score:2)
Re:Race (Score:4, Insightful)
Race is highly correlated with certain accents and forms of English. While it's true that genetics have little to do with the accent you have race is more of a cultural thing.
We don't really have a word for cultural bias, it tends to just get put down to race. Which is a shame because Scottish people have a lot of trouble with speech recognition.
Anyway it would suck if we all sounded the same because that was the only way to make computers understand us.
Re: (Score:2)
"We don't really have a word for cultural bias"
Ethnicist. The word you REALLY mean when you say 'racist.'
Re: (Score:2)
Anyway it would suck if we all sounded the same because that was the only way to make computers understand us.
You can say that again. Language diversity makes things better.
Re: (Score:2)
Re: (Score:3)
We already all sound the same on the phone because that's the only way to make tech support customers understand us
Re: (Score:2)
"We don't really have a word for cultural bias"
Uh...provincialism? Sort of...
accent VR (Score:5, Insightful)
Racial divide? What? (Score:5, Insightful)
I want to see the data. I want a room full of natural-born Americans of various races, all raised in the same state (let's say, Washington, as most say we have one of the more "neutral" accents in the US), and see what that does to the statistics.
We could just as easily, and accurately, say that speech recognition discriminates against those who live in the deep south.
There are educated people of every race and color who speak clearly and with good diction and grammar, and people of every race and color who speak with a heavy accent or use lazy pronunciation or horrible grammar. If people of one race or another have a higher percentage of people who speak poorly, that is not discrimination by the technology trying to interpret what they're saying.
FFS, why is this even news? Because we're all sick of coronavirus stories?
Re: Racial divide? What? (Score:4, Insightful)
Re: (Score:2)
Because we're all sick of coronavirus...
I see what you did there.
Re: (Score:2)
While its not universally true, in a lot of the US you'll find that Blacks speak with a completely different accent than Whites. Yes, even if they're from the same city and state.
Re: (Score:2)
Yes, but the way they speak is a result of their cultural/sub-cultural upbringing (nurture), not their race (nature). You can raise a White child in a Black family/neighborhood and a Black child in a White family/neighborhood and each child will take on the the accents and idioms of their environment.
Thus, race isn't the issue. The issue is that speech-recognition systems are calibrated to recognize the developer's or programmer's accent... which isn't really so much of an "issue" so much as it is a shortco
"African American Vernacular English"? (Score:3, Insightful)
I don't even understand the acoustics problem... were they yelling? If so, that sounds like a sample-selection problem. Which brings us to the elephant in the room.
They tested twice as many blacks as whites, and don't note from where, etc. Perhaps the sampling of the African American Vernacular English isn't close enough to "American English" to be essentially spell-corrected. After all, Ebonics or Old English would also presumably fail recognition.
The authors racistly did not correlate for professional experience and education. Harking back to Smokey and the Bandit [youtu.be], smarts are regional. I would expect many white folk from certain less-populated areas that are statistically over-represented in birth-toes and under-represented in adult teeth to do even worse on speech recognition, but I'd also expect the black graduates of Ivy League schools to fare exceptionally well on the tests.
Re: (Score:2)
Attributes associated with certain ethnicity, such as overall size and thickness of the neck, do affect the pitch of the voice. Black men tend to have slightly lower registers on average than white men, although of course it's a bell curve and highly variable. Anyway, different frequencies mean different acoustics and of course the microphones are tuned and post-processed to isolate speech primarily by frequency range too. So for example you might find that some people at either end of the curve have their
Re: (Score:2)
I can well bel
Re: (Score:2)
Your post would be more helpful if you had corrected the error and on top of that explained why your correction is right and he is wrong ...
For me your post makes no sense and I probably had modded you: troll.
15% of the time (Score:3)
Re:15% of the time (Score:4, Interesting)
Are you using a dictation program, where every word matters? Or, are your basing your impression of accuracy on a voice assistant like Siri or Alexa? In the latter case, you could say "Play me some tunes by the Allman Brothers Band", and the system could recognize "Play the Allmans", and you wouldn't know the difference, even though the word error rate was 77%.
I have been doing ASR R&D for about 30 years, and 15%, on average, for the speaker-independent error rate on real world (not laboratory) tasks is close enough to state of the art. There is of course a huge variation, based on speaking style, topic, noise conditions, microphone transfer function, etc. I would estimate the cross-speaker variance at about 10% (so 90% of speakers will experience somewhere between 5% and 25% word error rate). That a particular sub-population is out on the high end of that distribution is not surprising at all, particularly if you understand the weaknesses (and, yes, biases) of the model-building (both acoustic and language) algorithms.
Speaker adaptation to the rescue!
Re: (Score:2)
Yeah, it does not make any sense.
Most speech recognition I saw people using has not even 1% failure rate. I only used it rarely years ago on my Mac, but there the commands where "pre sampled" aka I said it ten times, connected it to the action and afterwards it was basically 100% correct. I never used speech to text, as typing is about 5 times faster.
Re: (Score:2)
Really? (Score:2)
The Scots are black now?
Re: (Score:2)
By the definition of "black" published by my university's student union back when I was an undergrad, yes. It's the history of oppression by the English that qualifies them...
Re: (Score:2)
Is the "sorry" an apology for attributing to me a definition which I reported without supporting, or for assuming that I was writing from a US context when I gave no indication thereof? For the record, this was an English university; the student union is a student body, not formally part of the university; this definition was from a pamphlet produced by the SU's vice president for ethnic minorities, who was himself black; and it also claimed that more than 50% of the world population is black women. I wrote
Re: (Score:2)
Black people in the US generally define themselves through the shared experience of chattel slavery.
Well that's fucking stupid of them. Almost none of them experienced it.
Perhaps if they defined themselves through, I don't know, things like education, obeying the law and working for a living then they'd fix a lot of their communities' issues.
Is it race... or is it accents/dialects? (Score:2)
Aaron Earned an Iron Urn: https://www.youtube.com/watch?... [youtube.com]
Or ELEVEN!! : https://www.youtube.com/watch?... [youtube.com]
More questions... (Score:2)
Fascinating but given past results with facial recognition, I can't say I'm surprised.
My next question is, does this happen with other languages? Presumably this is because English-speaking engineers trained their models with predominantly white English samples. I wonder if the Spanish models favored Spanish, Mexican, Basque, or some other dialect or accent?
Moving a bit afield, as an American and native English speaker, I definitely think I can mostly tell when someone is speaking with a (oh God this is goi
Re: (Score:2)
Basque is not a dialect of Spanish. It's not even in the Indo-European language family.
Re: (Score:2)
Basque is not a dialect of Spanish. It's not even in the Indo-European language family.
My bad. I knew I was going to get that wrong. What I had in mind was the Basques probably speak Spanish with a distinct accent. I'm thinking of how the Welsh have a distinct accent and their own language.
Re: (Score:2)
I don't know many Basques, and maybe my ear isn't well enough tuned, but I've not noticed that they have a distinct accent. FWIW the main accent divide in Spain is north (with the lisp) vs south (predominantly without the lisp, although there are some regions which "invert" the lisp in a language variant called "ceceo"). A strong Andaluz accent also loses the ends of many words, a phenomenon which is also common in Cuba.
Re: (Score:2)
>"I definitely think I can mostly tell when someone is speaking with a (oh God this is going to sound racist) black accent."
What you said is not "racist" at all. Of course, the meaning of the word "racist" and "racism" has been skewed all to hell to mean just about ANYTHING now.
Detecting an accent is just an observation; and one no human can prevent. And there is no one "black" accent any more than there is one "white" one, or "latin" one, or "asian" one, or regional one. Passing on the observation as
Re: (Score:2)
does this happen with other languages?
It certainly does. While watching Das Leben der Anderen [wikipedia.org] with the director's comment track turned on, at one point he remarked that one of the Stasi characters spoke with a particular regional accent. The equivalent in the USA being a hillbilly. Unfortunately, the resulting humor would be lost on non-German speakers.
And the there's the British Cockney accent. Not something I can reliably identify, but in a group of Brits, it does stand out to me. Extra credit if you know the proper definition of a Cockney w
Re: (Score:2)
does this happen with other languages?
It certainly does. While watching Das Leben der Anderen [wikipedia.org] with the director's comment track turned on, at one point he remarked that one of the Stasi characters spoke with a particular regional accent.
For sure. I remember in German class, our teacher mentioning that one character in a video had a very broad Bavarian accent.
What I meant to ask is whether speech recognition systems have trouble with some accents but not others in non-English languages. For that matter, I wonder if English speech recognition systems can recognize Boston, New Jersey, bog-standard American, southern drawl, Texas twang, midwest, BBC, Cockney, Liverpool, Aussie, Kiwi, and the other English accents equally well. Sounds like a da
Re: (Score:2)
It is not racist.
For me it is the same, and I'm not even a native english speaker. Black women/men singing or talking simply sound "black" no idea why. I sometimes make a mistake in that regard, but that is super rare. It is most certainly not an accent, it is something anatomic I guess.
Re: (Score:2)
> My next question is, does this happen with other languages?
Yes. I live in Canada. France lost Quebec in the "French and Indian War" which ended in 1763 https://en.wikipedia.org/wiki/... [wikipedia.org] Since then "Quebecois" French has dveloped a very different accent and spoken style from "Parisien" (i.e. European French). The grammar is identical, and emails are not a problem, but a Quebecois speaker and a Parisien speaker can have difficulty understanding each other in a conversation, if their accents are extreme.
Can we get back to coronavirus? (Score:2)
I want to hear more about how Coronavirus is utterly destroying retail while Amazon hires hundreds of thousands of new human robots.
Re: (Score:2)
Location matters too? (Score:3)
I had a perfectly wonderful experience with my phone and voice recognition. Then I spent some quality time in a middle eastern country. The damned thing started taking location into account, and I'd search or try to transcribe simple things and get exactly zero correct words. It continually tried to do 'local' search items in arabic and these things just don't work so well.
Just because you are a boring white guy doesn't mean your voice recognition is going to work.
A prominent example of failure from years ago (Score:2)
https://www.youtube.com/watch?... [youtube.com]
Any accent that is not standard American (Score:3, Informative)
As a Kiwi, I can tell you that all speech recognition software that I've tried struggles with my accent. I really do try putting on an American accent to get round it sometimes.
Amazon and some others now offer an Australian accent option, which is usually close enough to solve the problem for me.
Re: (Score:2)
I had a client one time who was Vietnamese. He had a THICK accent and I had trouble understanding him.
He was a doctor, and used Dragon Naturally Speaking to dictate notes. This worked great for him, had no trouble with his accent. When setting it up on a new computer, it ran through the calibration, having him read a paragraph a few times, after selecting which region most closely represented him, and after that, it worked beautifully.
Re: (Score:2)
U'll drunk a bear wuf a Kiwi bruv.
Hick accent (Score:4, Informative)
It doesn't do very well with my white wife's speech either. Strong East Texas accent.
This article is racist (Score:2)
What's racist about this is the blindness to it being a CULTURAL problem. Speak recognition is blind. Literally.
Asinine reporting.
No big mystery here (Score:3, Insightful)
Re: (Score:2)
That the majority of black people think it's fun to talk in a heavy slang with poor grammar and muddled pronunciation? Give the poor computer a break, it needs Standard English.
Re: (Score:2)
The computer doesn't need standard English. It needs whatever language and accent it was programmed/trained to recognize.
Racial divide - speech recognition system research (Score:2)
Re: (Score:2)
God help the system if they test it on a Guyanese man, their dialect is unintelligible to me.
Re: (Score:2)
Pronouncing the words like it says in the dictionary is part of speaking excellent English (or whatever language.) Lots of "white" people don't speak English very well either, including many if not most of the actual English, so I'm not picking on you for being "brown".
It's not white or even American English, since even the unaccented newscaster lack-of-accent is actually an accent which comes here from England. It's just English. You may have a great vocabulary and impeccable grammar, but if you can't pron
Example (Score:2)
User: "Alexa, I'd like to ax you a question."
Alexa: "OK, I've found some nearby places that sell axes..."
I guess I'll burn in hell a little for that one, but I just couldn't resist.
Can it understand Pikeys? (Score:2)
Irish ones in particular.
"Periwinkle blue"
Doesn't matter (Score:2)
Siri still looks stuff up on Wikipedia and expects me to read it instead of reading it for me. If I wanted to do that I would have already done that, biatch.
In the Ghetto (Score:3)
Really? Is it race?
No, it turns out it isn't. It's lazy editing.
The researchers took two datasets of voice samples. One spoken in "African American Vernacular English (AAVE)" the second one being "Voices of California (VOC)".
tl;dr: Turns out that ghetto slang is more difficult to understand than regular English. Say whaaaat? I'm a non-native speaker with enough English that I've been mistaken for native speakers - and I find it quite a lot more difficult than the language recognition did.
Every black person I've ever met in person that doesn't speak ghetto was very easy to understand, and frankly speaking I never even noticed any difference. And I doubt highly that there IS a difference. But yeah, if ya grew up in da hood it probly is a bat daffacult ta understand ya, yo!
When one trend consistently appears on thousands (Score:4, Insightful)
of measurements, in many areas, fields or places, maybe there is a reason why that is not "discrimination".
Human biodiversity is the most hated subject on Earth. Evolution requires species to exist in different populations and groups, and it requires those to be different. Evolution could not have happened without it.
People who claim they follow science in every way suddenly argue for Creationism when they claim humans are all exactly identical from birth and in every aspect. There is no middle ground, either: (people are born with the same capacities AND there are no group differences AND evolution does not apply to humans) XOR (human populations with genetic differences exist).
Eat your heart out. I am looking forward to people arguing that humans have evolved like all other species BUT humans are all identical and have no discernible differences in localized populations.
Re: (Score:2)
There is a broader issue illustrated here (Score:2)
?There is an assumption that computers are somehow neutral arbitrators of facts and tat biases do not exist in their decisions; when in fact their are inherent biases since humans did the programming that delivers the outputs. That doesn't mean the programmers intentionally introduced biases; just tthat they did not recognize their biases when they created the algorithms used by the computer.
The danger is relying on the computer's outputs on the assumption it is unbiased. Companies tout computer based asse
It's not about race.... (Score:2)
Example: "Eats shoots and leaves" could have multiple meanings depending on whether there is comma "Eats, shoots and leaves" or not.
Well Duh! proportional test data (Score:2)
Re: (Score:2)
Re: (Score:2)
The bigger issue is why these large tech companies don't specifically address an economic market of over 40 million people.
There is a lot of variety in the speech of those 40 million people. They ain't all talking the same.