Follow Slashdot stories on Twitter

Rest In Peas — the Death of Speech Recognition 342

Posted by Soulskill on Monday May 03, 2010 @05:07PM from the yale-in-ox-boom-i-crows-off dept.

An anonymous reader writes "Speech recognition accuracy flatlined years ago. It works great for small vocabularies on your cell phone, but basically, computers still can't understand language. Prospects for AI are dimmed, and we seem to need AI for computers to make progress in this area. Time to rewrite the story of the future. From the article: 'The language universe is large, Google's trillion words a mere scrawl on its surface. One estimate puts the number of possible sentences at 10^570. Through constant talking and writing, more of the possibilities of language enter into our possession. But plenty of unanticipated combinations remain, which force speech recognizers into risky guesses. Even where data are lush, picking what's most likely can be a mistake because meaning often pools in a key word or two. Recognition systems, by going with the "best" bet, are prone to interpret the meaning-rich terms as more common but similar-sounding words, draining sense from the sentence.'"

This discussion has been archived. No new comments can be posted.

Rest In Peas — the Death of Speech Recognition

Load All Comments

Search 342 Comments Log In/Create an Account

Comments Filter:

Buffalo buffalo (Score:5, Insightful)

by Anonymous Coward writes: on Monday May 03, 2010 @05:11PM (#32077194)

Buffalo buffalo Buffalo buffalo buffalo, buffalo Buffalo buffalo.

Share
twitter facebook
- Mod parent up (Score:3, Informative)
  
  by idiot900 ( 166952 ) * writes:
  
  Would that I had mod points today.
  The above is a valid English sentence and a poignant example of how difficult it is to parse language without knowledge of semantics.
  - Focus, Dammit. (Score:2)
    
    by Jeremiah Cornelius ( 137 ) * writes:
    
    "What, all of us?"
    - Re: (Score:2)
      
      by Philip K Dickhead ( 906971 ) writes:
      
      The sixth sheik's sixth sheep's sick.
      [so, say said sentence sextuply...]
      - Re:Focus, Dammit. (Score:5, Funny)
        
        by linhares ( 1241614 ) writes: on Monday May 03, 2010 @09:40PM (#32080496)
        
        "she helped my uncle jack off a horse"
        
        Parent Share
        twitter facebook
  - Re:Mod parent up (Score:5, Interesting)
    
    by x2A ( 858210 ) writes: on Monday May 03, 2010 @05:27PM (#32077448)
    
    There's nothing special about computers though, people have to do that with other people... lets not kid ourselves into thinking that humans are immune to misunderstandings. No, the more you get to know someone, the way they think and express theirselves, the better you can become at communicating with them. Different words to different people have different connotations. It can take a lot of work to get all these down, and it'd be no different with a computer. For effective communication, you'd train and build up a common language with it, that might seem nonsense to outsiders... and I, for one, welcome this.
    
    Parent Share
    twitter facebook
    - - Re:Mod parent up (Score:4, Insightful)
        
        by arth1 ( 260657 ) writes: on Monday May 03, 2010 @09:55PM (#32080606) Homepage Journal
        
        If I said "I had a hard time staying a wake", both a person and a computer would misunderstand and think I said "I had a hard time staying awake."
        You give computers way too much credit.
        More likely it would think you said "Dear aunt, let's set so double the killer delete select all".
        My experience with telephone Voice Rejection Systems is that they get what you say wrong more often than not, especially if you have a deep voice.
        
        Parent Share
        twitter facebook
    - - Re:Mod parent up (Score:4, Funny)
        
        by Jeremi ( 14640 ) writes: on Monday May 03, 2010 @10:44PM (#32080882) Homepage
        
        Nonsense! For example, a real human could never mishear the phrase "guide dog" as "gay dog" and refuse to let a dog into a restaurant.
        Well to be fair, understanding Australians is an order of magnitude more difficult than understanding English speech.
        (ducks)
        
        Parent Share
        twitter facebook
  - Re:Mod parent up (Score:4, Insightful)
    
    by Antiocheian ( 859870 ) writes: on Monday May 03, 2010 @06:00PM (#32077992) Journal
    
    Not necessarily. Speech recognition doesn't fail when it can't figure out elaborate grammatical constructs and lexical ambiguities. Speech recognition fails because it can't figure out simple sentences in conditions humans can [youtube.com].
    
    Parent Share
    twitter facebook
  - Re: (Score:3)
    
    by repapetilto ( 1219852 ) writes:
    
    Actually I went to Buffalo one time to try to get a picture of this occurring to put on the wikipedia page. Its harder than you'd think since the skyline isnt that huge and buffalo do alot of nothing most of the time. But here's one of Buffalo buffalo about to buffalo Buffalo buffalo that's thinking about buffaloing Buffalo buffalo.
    http://tinypic.com/r/xcqa06/5 [tinypic.com]
  - - - Re:Mod parent up (Score:5, Interesting)
        
        by brian_tanner ( 1022773 ) writes: on Monday May 03, 2010 @06:16PM (#32078196)
        
        I think you're probably about 10-20 years out of date with your criticism. AI these days is *all about* statistical machine learning which is *all about* data and not about formal or expert systems at all. This is what Google and others are doing. The AI you are describing is from the late 80s and early 90s.
        
        Neural networks are part of the story, but many of the ideas from ANNs have been improved upon when more structured settings are available. There is actually a resurgence right now in deep neural network though.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3, Insightful)
        
        by jpate ( 1356395 ) writes:
        
        When you have lots of data, you don't have to build any "expert" knowledge into a learner.
        This isn't really quite so clear cut. Feature engineering, model structure, model training techniques, and so on all bias statistical learners towards different parts of the hypothesis space. Hidden markov models (the standard in speech recognition) clearly constitute a data-driven approach, but usually they predict diphones (which appreciates the transitions between speech sounds) rather than phones themselves. That is, "cat" is recognized not by predicting a [k] followed by an [ae] followed by a [t], but
  - - Re:Mod parent up (Score:5, Informative)
      
      by Known Nutter ( 988758 ) writes: on Monday May 03, 2010 @07:19PM (#32079134)
      
      Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo. [wikipedia.org]
      
      Parent Share
      twitter facebook
- Re: (Score:3, Funny)
  
  by Anonymous Coward writes:
  
  This rest ponds was and turd you sings peach recon nation soft where
- Re:Buffalo buffalo (Score:5, Funny)
  
  by CecilPL ( 1258010 ) writes: on Monday May 03, 2010 @05:24PM (#32077394)
  
  That comma is just out of place and makes the sentence hard to parse.
  
  Parent Share
  twitter facebook
- Re:Buffalo buffalo (Score:5, Insightful)
  
  by liquiddark ( 719647 ) writes: on Monday May 03, 2010 @05:30PM (#32077490)
  
  What human can parse this without an expert to tear apart the context? I don't see the point in trying to serve up a sentence that simply isn't a sentence to most speakers of the language.
  
  Parent Share
  twitter facebook
  - - Re:Buffalo buffalo (Score:5, Insightful)
      
      by JanneM ( 7445 ) writes: on Monday May 03, 2010 @07:58PM (#32079620) Homepage
      
      Most people won't be able to parse the sentence, though. I know I can't. I have no idea how to interpret it as anything but a string of nouns. My guess is, even fewer would be able to parse it if spoken (the capitals and the comma are, I assume, important hints). It'd be unrealistic and unproductive to require speech systems to actually do better than most humans on the task; if many of us can't parse the sentence then why expect a computer to do so?
      Better overall benchmark: require it to have the ability of a competent but not perfect second-language user. We're long used to dealing with that level of proficiency, whether because the conversant is a foreigner, a child, or has a dialect very different from our own.
      
      Parent Share
      twitter facebook
- Badger badgers badger Badger badgers (Score:2)
  
  by tepples ( 727027 ) writes:
  
  Buffalo buffalo
  Likewise, Badger badgers Badger badgers badger, badger Badger badgers. (UW taxideans harassed by UW taxideans harass other UW taxideans.) Oh, and mushroom mushroom [badgerbadgerbadger.com].
  - Re:Badger badgers badger Badger badgers (Score:5, Funny)
    
    by Anonymous Coward writes: on Monday May 03, 2010 @05:47PM (#32077782)
    
    snaaaaaaake!
    
    Parent Share
    twitter facebook
- Re:Buffalo buffalo (Score:5, Informative)
  
  by hoggoth ( 414195 ) writes: on Monday May 03, 2010 @05:37PM (#32077634) Journal
  
  Buffalo bison whom other Buffalo bison bully, themselves bully Buffalo bison.
  
  Parent Share
  twitter facebook
- Re:Buffalo buffalo (Score:5, Informative)
  
  by Anonymous Coward writes: on Monday May 03, 2010 @05:38PM (#32077648)
  
  For those that don't know:
  http://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo
  'Buffalo bison whom other Buffalo bison bully, themselves bully Buffalo bison'.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Interesting)
    
    by ClosedSource ( 238333 ) writes:
    
    If only speech recognition's problems were limited to these low-probability sentences. I've had a number of SR systems fail to recognize my "yes" and "no" responses.
  - Re: (Score:3, Interesting)
    
    by asc99c ( 938635 ) writes:
    
    I'd never heard this one before, guess it's the American version! The one I was taught was a complaint by a pub landlord to their sign writer:
    You've left too much space between pig and and and and and whistle
- Re: (Score:2)
  
  by blair1q ( 305137 ) writes:
  
  Your marklar is well marklar.
- - Re: (Score:2)
    
    by Hylandr ( 813770 ) writes:
    
    Braincells are jumping to their deaths from my ears...
Key words (Score:3, Interesting)

by flaming error ( 1041742 ) writes: on Monday May 03, 2010 @05:15PM (#32077240) Journal

> meaning often pools in a key word or two
It's true.
My own hearing is not great. I often miss just a word or two in a sentence. But they are often key words, and missing them leaves the sentence meaningless. If I counted the words I understand correctly I'd probably have a 95% success rate. But if I counted the sentences I understand correctly, I'd be around 80%. So I get by, but I tend to annoy people when I ask for repeats over one missed word.

Share
twitter facebook
- Re:Key words (Score:5, Funny)
  
  by SomeJoel ( 1061138 ) writes: on Monday May 03, 2010 @05:23PM (#32077368)
  
  & It's true.
  My own ... is not great. I often miss ... a word or two in a sentence. But they are often ... words, and missing them leaves ... sentence meaningless. If I counted the words I understand ... I'd probably have a 95% success rate. But if I counted the ... I understand correctly, I'd be around ...%. So I get by, but ... tend to annoy people when I ask for ... over one missed word.
  I can see how this would be annoying.
  
  Parent Share
  twitter facebook
  - Re:Key words (Score:4, Funny)
    
    by CarpetShark ( 865376 ) writes: on Monday May 03, 2010 @06:20PM (#32078256)
    
    I can see how...would be annoying.
    Can see how WHAT would be annoying?
    
    Parent Share
    twitter facebook
Android Speech Recognition Rules (Score:5, Informative)

by bit trollent ( 824666 ) writes: on Monday May 03, 2010 @05:15PM (#32077254) Homepage

I hardly type anything in to my HTC Incredible. Google's voice recognition, which is enabled on every textbox works just about perfectly.
Seriously, get an Android phone, try out the speech recognition text entry, and then tell me speech recognition is dead.

Share
twitter facebook
- Comment removed (Score:5, Funny)
  
  by account_deleted ( 4530225 ) writes: on Monday May 03, 2010 @05:30PM (#32077506)
  
  Comment removed based on user account deletion
  
  Parent Share
  twitter facebook
  - Re:Android Speech Recognition Rules (Score:5, Funny)
    
    by orangesquid ( 79734 ) writes: <orangesquid.yahoo@com> on Monday May 03, 2010 @07:09PM (#32078998) Homepage Journal
    
    What Dave said: "Open the pod bay doors, HAL."
    What HAL heard: "Open the hot babe pornz, HAL."
    HAL's speech recognition and morality programming* combined to give the famous reply, "I'm sorry, Dave. I'm afraid I can't do that." HAL knew certain things would have been too titillating to an all-ages film audience in 1968.
    * Only for the film version. In the book version, it would have caused undue frustration to the reader, unable to see what Bowman was viewing. In that case, it was HAL's etiquette programming.
    
    Parent Share
    twitter facebook
- Re: (Score:3, Interesting)
  
  by bertok ( 226922 ) writes:
  
  I hardly type anything in to my HTC Incredible. Google's voice recognition, which is enabled on every textbox works just about perfectly.
  Seriously, get an Android phone, try out the speech recognition text entry, and then tell me speech recognition is dead.
  I've tried Google voice recognition, but I found that it just detected gibberish unless I spoke with a fake American accent.
  - Re:Android Speech Recognition Rules (Score:5, Funny)
    
    by Trogre ( 513942 ) writes: on Monday May 03, 2010 @06:06PM (#32078064) Homepage
    
    ... so its voice recognition works about as well as that of the average American then? ;)
    
    Parent Share
    twitter facebook
- Re: (Score:3, Funny)
  
  by Digero ( 974682 ) writes:
  
  We might get to the point where we can write text messages by speaking, then the person on the other end could have them read aloud by a computer. That would be so awesome. Maybe some day we'll be able to transfer the actual sound of our voices.
- - Re: (Score:3, Insightful)
    
    by peragrin ( 659227 ) writes:
    
    I gave up voice dialing when i sneezed and dialed my father. I coughed and got my mother,but no matter what i ddid a loud fart would not call my brother but open the web browser and visit slashdot.
    Okay the last one might be a lie, but the sneezing to get my father is true. ry it, Make funny sharp noises at your voice dialer and see what it dials.
Let me guess (Score:5, Funny)

by Zerth ( 26112 ) writes: on Monday May 03, 2010 @05:15PM (#32077258)

That summary was written with speech recognition software?

Share
twitter facebook
- Re: (Score:3, Funny)
  
  by MollyB ( 162595 ) writes:
  
  Hesitant grate watts peach wreck ignitions oft where kin dew ferrous?
AI (Score:5, Insightful)

by ShadowRangerRIT ( 1301549 ) writes: on Monday May 03, 2010 @05:17PM (#32077280)

Natural language processing *is* AI. And high accuracy speech recognition requires natural language processing if we expect to have accuracy rates approaching that of a human. Humans hear words partially or incorrectly all the time. We fill in the gaps from context, and we correct if the course of the conversation reveals that the original interpretation is wrong. Expecting computers to do better, when half the time the problem is the speaker, not the listener, means you need it to be able to make the same corrections from limited information on the fly, and after the fact that a human brain makes.

Share
twitter facebook
- Re:AI (Score:4, Insightful)
  
  by ShadowRangerRIT ( 1301549 ) writes: on Monday May 03, 2010 @05:24PM (#32077372)
  
  Just as an example, my father is partially deaf. No hearing in one ear, and less than a quarter of human baseline in the other. But with a hearing aid (which still doesn't get him to full functionality), he gets 95% accuracy or better in regular conversation, and it gets better as the conversation progresses. It's not because the hearing aid is fixing the underlying problem (it can't, since the problem is in the inner ear). But if he knows the general topic, and picks up on 50% of the phonemes, he can fill in the blanks and figure out the gist of the sentence, despite hearing it in bits and pieces. As the conversation progresses, his accuracy improves because he is supplying the prompts; if the responses fall into the set of "expected" responses, filling in the gaps becomes even easier. By contrast, if you change topics abruptly or go off on a tangent, you may need to repeat yourself half a dozen times. Now a computer will have better "hearing", but if it doesn't know the topic before you start, it's going to have the same problem anytime you slur a word, elide a syllable, or clear your throat mid-sentence. People expect to speak to a computer and have it understand, forgetting that people aren't usually expected to interpret a sentence in isolation, with no idea of the topic.
  
  Parent Share
  twitter facebook
- Re: (Score:3, Interesting)
  
  by ircmaxell ( 1117387 ) writes:
  
  Exactly... In order to do anything more than just "the word that was just spoken was 'x'", you need contextual and object clues. Hofstadter did a great job talking about this in his book Gödel, Escher, Bach: An Eternal Golden Braid. Right now, computers can do nothing more than simple symbol lookups. Speech recognition tries to find the word that matches the vocal pattern. So when it stumbles, the result is useless (the same goes for OCR). With contextual recognition, it can more accurately guess
  - Re: (Score:3, Informative)
    
    by wurp ( 51446 ) writes:
    
    Google voice recognition already does exactly that. It matches words against their database of words commonly used together via their search engine.
    This message was composed using android voice recognition on my nexus 1 phone. I had to manually correct 2 words out of the whole post.
That's Because... (Score:5, Funny)

by BJ_Covert_Action ( 1499847 ) writes: on Monday May 03, 2010 @05:17PM (#32077282) Homepage Journal

It only flatlined because nobody tried to write speech recognition software in perl*.

*Disclaimer: Poster is not responsible for attempts resulting in unintended AI development and/or end of the world scenarios brought on by such an irresponsible endeavor.

Share
twitter facebook
Well duh. (Score:4, Funny)

by bmo ( 77928 ) writes: on Monday May 03, 2010 @05:17PM (#32077284)

Even humans mishear speech.
"'Scuse me while I kiss this guy"
That misheard lyric is so common that there's a book about misheard lyrics with that as the title.
--
BMO

Share
twitter facebook
- Re: (Score:2)
  
  by Eudial ( 590661 ) writes:
  
  Eggcorns [wikipedia.org] constitute another great example of how humans get this wrong.
- Re: (Score:2, Funny)
  
  by CityZen ( 464761 ) writes:
  
  "Time flies like an arrow; fruit flies like a banana."
  - Time flies (Score:2)
    
    by tepples ( 727027 ) writes:
    
    "Time flies like an arrow; fruit flies like a banana."
    Is a time fly an archer or a DDR player?
- Re:Well duh. (Score:5, Funny)
  
  by Chris Burke ( 6130 ) writes: on Monday May 03, 2010 @05:28PM (#32077462) Homepage
  
  That misheard lyric is so common that there's a book about misheard lyrics with that as the title.
  I know! A surprising number of people think Hendrix was talking about kissing the sky, rather than embracing the experimental, counter-culture, and free-love nature of the 60's, simply because they don't like to think of their testosterone-filled hero sucking face with another dude. Like, get over it! "Kiss the sky" doesn't even make any sense unless you're on some kind of mind-altering substance, and there's no way Jimmy would have put something like that in his body!
  
  Parent Share
  twitter facebook
  - - Re:Well duh. (Score:5, Interesting)
      
      by Chris Burke ( 6130 ) writes: on Monday May 03, 2010 @06:42PM (#32078582) Homepage
      
      Or have an ounce of poetry in you... ;)
      Hmm... I guess I don't have that since I don't know what it is. That's okay, I can find out with the help of my AI using the latest in voice recognition software! Computer, what is "poetry"?
      Computer: "Poetry" is a form of literary art, frequently using an organized metric and rhyme scheme, that attempts to evoke an emotional response in the reader through the use of metaphor.
      Huh, okay, that's interesting. But computer, what is a metaphor?
      Computer: A "meta" is for people who lack the capabilities to contribute directly to a field or endeavor, but who still wish to sound educated and useful by discussing the nature of the field or endeavor itself. Example: "Physics has way too much math for me, but meta-physics is right up my alley!"
      Yeah, now I'm just confused.
      
      Parent Share
      twitter facebook
- Re: (Score:2)
  
  by bunratty ( 545641 ) writes:
  
  You mean Hendrix wasn't gay after all? Next you'll be telling me that CCR never said there's a bathroom on the right!
  - Crap, crap, crap into the toilet bowl (Score:2)
    
    by tepples ( 727027 ) writes:
    
    PS1 music game Parappa the Rapper turns "There's a bathroom on the right" into a rap song [youtube.com].
- Re: (Score:2)
  
  by Tynin ( 634655 ) writes:
  
  Even humans mishear speech.
  "'Scuse me while I kiss this guy"
  That misheard lyric is so common that there's a book about misheard lyrics with that as the title.
  -- BMO
  Their was a Tool song my friends and I argued over the lyrics of for quite some time. Think it was the prison sex song. I was sure it said, "...my lamb and martyr, this will be over soon...". My friends were sure the song was, "...my loving mother, this will be over soon...". Considering the topic of the song I suppose it could have had yet another level of depravity to it with the whole mom/incest angle, my perverted friends sure thought so, even though I didn't think it made much sense.
- Re: (Score:2)
  
  by blair1q ( 305137 ) writes:
  
  But...but computers are supposed to be perfect!
- Re: (Score:2)
  
  by swilver ( 617741 ) writes:
  
  Well, I checked a few pages worth of content on that site, and I must say it looks like most of these "misheard" lyrics are people trying to make a funny (often sex related) joke (or are simply lacking the correct vocabulary knowledge) instead of actually mishearing the lyric. Some of the songs on that list have lyrics that are so clear it's near impossible to hear them wrong. I certainly didn't find any that I heard wrong.
  Disclaimer: I'm not native English, I am a musician though.
Number of sentences? (Score:2, Insightful)

by Logarhythmic ( 1082321 ) writes:

One estimate puts the number of possible sentences at 10^570
What a completely useless metric. It makes sense to examine the context and meaning of speech in order to accurately transcribe words, but the number of possible sentences doesn't seem to accurately describe the problem here...
Windows 7 (Score:3, Interesting)

by Anonymous Coward writes: on Monday May 03, 2010 @05:19PM (#32077324)

I've been using VR in Win7 for a few weeks now. I can honestly say that after a few trainings, I'm near 100% accuracy. Which is 15% better than my typing!

Share
twitter facebook
- Re:Windows 7 (Score:4, Informative)
  
  by adonoman ( 624929 ) writes: on Monday May 03, 2010 @05:29PM (#32077484)
  
  People underestimate the value of training - we do it subconsciously when we meet people with different accents or vocal tones. At first people are hard to understand, but given an hour or so talking to someone, you eventually stop noticing their accent. Windows 7 seems to do a really good job at learning from use (it learns even without explicit training when you make corrections). I have windows 7 tablet and the voice recognition is impressive. Its handwriting recognition is even better than mine when it comes to my writing (it benefits from knowing the directions and order of strokes) - I just scratch out something vaguely resembling something I want to write and it seems to recognize it almost 100% of the time.
  
  Parent Share
  twitter facebook
Not Dead Yet (Score:2, Insightful)

by Shidash ( 1420401 ) writes:

I doubt it is completely dead. I have yet to hear it from the researchers working on AI. I work in affective computing, so I am thinking that it is possible that the missing component could be emotion or another way to increase the understanding and ability of computers to learn. In addition, even if it is not possible to increase speech recognition capabilities in this model of computing, in another model of computing this and more would be possible. I am not believing it until I hear it from researchers
- Totally Not Dead Yet (Score:5, Interesting)
  
  by RingDev ( 879105 ) writes: on Monday May 03, 2010 @05:50PM (#32077834) Homepage Journal
  
  A few years back I worked for an awesome company that did a IVR (interactive voice recording) systems.
  We had voice driven interactive systems that would provide the caller with a variety of different mental health tests (we work a lot with identifying depression, early onset dementia, Alzheimer, and other cognitive issues.
  The voice recognition wasn't perfect, but we had a review system that dealt with a "gold standard". I wrote a tool that would allow a human being to identify individual words and to label them. Then we would run a number of different voice recognition systems against the same audio chunk and compare their output to the human version. It effectively allowed us to unit test our changes to the voice recognition software.
  Dialing in a voice recognition system is an amazing process. The amount of properties, dictionaries, scripting, and sentence forming engines are mind blowing.
  Two of the hardest tests for our system were things like: Count from 1 to 20 alternating between numbers and letters as fast as you can, for example 1-A-2-B-3-C. And list every animal you can think of.
  The 1-A-2-B was killer because when people speak quickly, their words merge. You literally start creating the sound of the A while the end of the 1 is still coming. It makes it extremely difficult to identify word breaks and actual words. And if you dial in a system specifically to parse that, you'll wind up with issues parsing slower sentences.
  The all animals question had a similar issue, people would slur their words together, and the dictionary was huge. It was even more challenging when one of the studies that was nation wide. We had to deal with phonetic spellings from the north east coast and southern states accents. What was even worse was that there was no sentences. We couldn't count on predictive dictionary work to identify the most likely word out of those that would match the phonetics.
  That said, getting voice recognition to work on pre-scripted commands and sentences was pretty easy.
  And I can only imagine the process has been improving in the years since. Although we were looking into SMS based options, not for a dislike of IVR, but because our usage studies with children were showing most of them were skipping the voice system and using the key pad anyway. So why bother with IVR if the study's target demographic was the youth.
  -Rick
  
  Parent Share
  twitter facebook
World model (Score:2, Informative)

by Anonymous Coward writes:

Speech recognition mechanisms/algorithms are not entirely
the problem. What needs to back them up is called a "world
model," and, as the name implies, this can be large and open
ended. Humans being able to correct spoken/heard errors
on the fly is because of having an underlying world model.
Time flies like an arrow fruit flies like a banana (Score:3, Insightful)

by GuyFawkes ( 729054 ) writes: on Monday May 03, 2010 @05:24PM (#32077386) Homepage Journal

Having said that, Dragon works fairly well, provided you modulate your speech.
If you want a laugh with Dragon, turn away from the screen and talk normally, then look at what it has transcribed..

Share
twitter facebook
- Re: (Score:2)
  
  by SomeJoel ( 1061138 ) writes:
  
  The eighties were like half as groovy as the seventies, but twice as cool as the nineties.
Training (Score:2)

by dominious ( 1077089 ) writes:

speech recognition requires training because it lies on Machine Learning algorithms. Nobody has time to train their computer. I mean, even us humans need 2-3 years of such "training" in order to start recognizing words.
Speech recognition is higher intelligence (Score:2)

by gurps_npc ( 621217 ) writes:

Speech recognition is a form of higher intelligence.
Intelligence is basically composed of pattern recognition, with two general categories. One) Specific pattern recognition is logic, math, etc. It requires incredibally exact matches. Yes or no. 1.0, not 1.00001. Computers are very very good at that.
Two) General pattern recognition is creativity, art appreciation, and our capacity to invent. It requires people to ignore a ton of irrelevant data and instea focus on only one aspect of identity, reco
Since I don't have a flying car today, all is lost (Score:5, Insightful)

by liquiddark ( 719647 ) writes: on Monday May 03, 2010 @05:26PM (#32077430)

Futurists should really learn what the word "plateau" means. The death of any given technical progression, particularly one that deals with information procesing, tends to be announced early and often, right up to the point where progress becomes meaningful again and then all of a sudden everyone saw it coming, and oh by the way where's my flying car?

Share
twitter facebook
Sssssh. (Score:2)

by Allnighterking ( 74212 ) writes:

Don't tell the people actually doing it. They don't know that the author of this piece says it won't work. So they keep making it work. We don't want to upset them. Ssssh.
Speech recognition and translation is becoming a highly effective and proficient tool for the US military. You see it fit's in your iPod... and ... well translates. info here [physorg.com] Kinda puts the knosh on this article. Speech recognition as a part of translation is a new application of the tech that is growing by leaps and bounds. 10 y
is there any evidence for this analysis? (Score:4, Insightful)

by Trepidity ( 597 ) writes: <delirium-slashdotNO@SPAMhackish.org> on Monday May 03, 2010 @05:27PM (#32077440)

I see a lot of claims, but not much evidence. If we're going to use perceptions and anecdotes as evidence, my impression is that speech recognition has always been considered vaguely stalled. In 2000, people didn't think much progress had been made since 1991 besides some commercialization of stuff academia already knew how to do. In 2010, this guy doesn't think much progress has been made since 2001 besides some commercialization of stuff academia already knew how to do. Yet I think some progress has been made over the past 20 years. There just haven't been any breakthroughs, which is maybe what he's expecting, given his vague suggestion that "AI", a pretty vague concept, is our hope.
I'm also skeptical that accuracy has flatlined, though it's possible that's true in some areas. My impression is that multi-speaker recognition, use of large corpora to improve accuracy, and use of language modeling to improve accuracy, have all improved [google.com] over the past 10 years. Of course, not all improvements go everywhere: the speech recognition running in real-time on a mobile ARM processor is not using every possible state-of-the-art technique. The advance there is that you can run speech recognition in real-time on a mobile ARM processor at all, and get performance that was once only possible on pretty hefty workstations.

Share
twitter facebook
No it doesn't (Score:3, Interesting)

by Colin Smith ( 2679 ) writes: on Monday May 03, 2010 @05:27PM (#32077444)

It works great for small vocabularies on your cell phone
No. It doesn't.
It works great for small vocabularies on your cell phone if you happen to live in the same neighbourhood as the developer where "everyone talks this way". For the rest of the world, attempting to talk with a nasal American twang in order to get the phone to understand you, is shit.

Share
twitter facebook
Blame startrek (Score:5, Insightful)

by onyxruby ( 118189 ) writes: <onyxruby&comcast,net> on Monday May 03, 2010 @05:28PM (#32077456)

Blame Startrek for making it look flawless. Speech recognition is just like fusion technology, 20 years away from properly working - just like it has been for the last 20 years.
-RANT- I cant stand voice recognition systems that don't at least give you an option to press a number. Especially when they are out of tune and pick up back ground noises as voice. Please, please, please - always give the option to press a number instead of having to voice everything!!

Share
twitter facebook
yale-in-ox-boom-i-crows-off (Score:2)

by richdun ( 672214 ) writes:

Yay Linux! Boo Microsoft!
I win! Give me all your speech recognition monies.
Wait, what do you mean you don't believe I'm an AI? ... er, I mean ... Wait, what do you mean you do not believe I am an Artificial Intelligence?
IBM? (Score:2, Funny)

by Darth Snowshoe ( 1434515 ) writes:

Didn't IBM a few years ago announce a big five-year-program to crack speech recognition? Whatever came of that?
- Re: (Score:2)
  
  by PalmKiller ( 174161 ) writes:
  
  They used it to make a better chess playing AI instead.
- Re:IBM? (Score:5, Interesting)
  
  by N1ck0 ( 803359 ) writes: on Monday May 03, 2010 @05:55PM (#32077904)
  
  IBM closed many of their speech research offices 1-2 years ago and transferred most of the research/data to Nuance's Dragon Naturally Speaking research.
  Full Disclosure: I work for Nuance
  
  Parent Share
  twitter facebook
Tea, Earl Grey, Hot (Score:5, Funny)

by tokki ( 604363 ) writes: on Monday May 03, 2010 @05:33PM (#32077546)

How hard is it for a computer to understand the sentence: "Tea, Earl Grey, Hot"? That takes care of 90% of the use case scenarios right there. Next is "Computer, initiate auto-destruct sequence" is the next 8%.

Share
twitter facebook
- Re: (Score:3, Interesting)
  
  by maxwells_deamon ( 221474 ) writes:
  
  by definition the second phrase eliminates any remaining use cases after the count down finishes.
- Re:Tea, Earl Grey, Hot (Score:4, Funny)
  
  by martin-boundary ( 547041 ) writes: on Monday May 03, 2010 @10:10PM (#32080686)
  
  Here I am, brain the size of a planet, and they ask me to make you tea for you. Call that job satisfaction, 'cause I don't.
  
  Parent Share
  twitter facebook
Shout-outs to two idiots (Score:5, Insightful)

by Foobar_ ( 120869 ) writes: on Monday May 03, 2010 @05:33PM (#32077550)

This blog post is retarded. The author is correlating a drop in internet news articles about Dragon NaturallySpeaking with a flatlining of speech recognition accuracy rate.
The Slashdot editor Soulskill is retarded for both not realizing this and for not reading the anonymously-submitted blog post (hmm no way it could have been the author) before approving it for the Slashdot front page. The guy is just out for more traffic to his rather pointless tech news commentary blog.
Decline of Slashdot, internet signal-to-noise ratio, get off my lawn, etc.

Share
twitter facebook
no, it doesn't work on cell phones, either (Score:2)

by swschrad ( 312009 ) writes:

this is the reason that millions of americans are faster with the thumb than Buddy Rich with the drumsticks... you can't see the finger move as they type 30 zeroes in a row to escape the mumblebots.
Data Input (Score:2)

by fermion ( 181285 ) writes:

Automated data input is always tricky. Basically the technology is type it on a keyboard or use voice recognition software or dictate and pay someone to type it in a computer. When people talk about voice recognition they are think the it is competing against typing it in yourself, but it most is competing against paying someone else to type it in.
My understanding, from the people that use Dragon, it competes well against paying someone else to type. First it is a couple of orders cheaper. Second, if
Dear Aunt, (Score:2)

by IorDMUX ( 870522 ) writes:

Any discussion of the history of speech recognition is incomplete without a reference to Microsoft's famous Windows Vista "double the killer delete select all" botch-up: http://www.youtube.com/watch?v=klU2zt1KdUY [youtube.com]
Forget speech recognition.... (Score:3, Funny)

by puppetman ( 131489 ) writes: on Monday May 03, 2010 @05:39PM (#32077664) Homepage

I'd settle for a grammar checker. From the fine summary:
"Even where data are lush"
A good one would have saved this summary from sounding stupid.

Share
twitter facebook
- Re: (Score:3, Informative)
  
  by jaavaaguru ( 261551 ) writes:
  
  There is nothing wrong with that phrase.
- Re:Forget speech recognition.... (Score:4, Insightful)
  
  by Pfhorrest ( 545131 ) writes: on Monday May 03, 2010 @06:42PM (#32078578) Homepage Journal
  
  The word "data" is a plural countable noun. "Datum" is the singular form thereof. Plural countable nouns take the copula "are". Singular countable nouns take the copula "is". The sentence you quoted was thus grammatically correct: a datum "is", but data "are".
  
  Though I admit, the treatment of "data" as a mass noun (the likes of which take the copula "is" as well) is common enough that it did sound jarring to my own ear, even knowing it was technically correct.
  
  Parent Share
  twitter facebook
- Re:Forget speech recognition.... (Score:5, Informative)
  
  by kindbud ( 90044 ) writes: on Monday May 03, 2010 @07:33PM (#32079318) Homepage
  
  The word "data" pluralizes "datum." "Data are lush" correctly pluralizes the singular form of the sentence.
  Now who sounds stupid?
  
  Parent Share
  twitter facebook
Wrong problem (Score:2, Interesting)

by slasho81 ( 455509 ) writes:

There won't be any meaningful development in speech recognition (or machine translation) until context is taken seriously. Context is an inseparable part of speech.
Right now the problem being solved is audio->text. This is the wrong problem, and why the results are so lame. The real problem is audio+context->text+new context. This takes some pretty intelligent computing and not the same old probabilistic approaches.
Watermelon Box (Score:5, Insightful)

by NReitzel ( 77941 ) writes: on Monday May 03, 2010 @05:56PM (#32077934) Homepage

Long ago - decades, before Bill Gates was invented, a lot of research went into what would be required for actual voice recognition.
A counterexample was given, about an engineering marvel (of the time) that would recognise when someone said the word "watermelon". For a long time, people in the industry assumed that the path to voice recognition consisted of building more and better watermelon boxes.
Several authors, including Alan Turing himself, argued that actual voice recognition could never be accomplished with a large array of watermelon boxes. Current VR software divides input into a series of hyperplanes, and attempts to build a best match from the classification tree.
THis is the 2010 version of the watermelon box.
Real voice recognition won't be practical until the input is parsed, matched against context, and structured much akin to diagramming a sentence in those old English (or other) classes. In short, matching against a vocabulary is trying to solve an exponential problem with a (large) polynomial engine.
It won't be until the computer actually understands what is said that VR is likely to be practical in a global sense.
As a person who has been building computer systems for 35 years, it bothers me to see a huge body of research done into subjects like these ignored, because someone thinks that none of it applies to PC's.

Share
twitter facebook
- Re: (Score:3, Informative)
  
  by frank_adrian314159 ( 469671 ) writes:
  
  People were doing symbolic context recognition in the 60's-80's (look up frames). This went out of vogue with the use of neural nets and statistical recognition in the late 80's and continues up to this day. The problem is that getting better now probably needs new probabilistic models for symbolic context recognition, feeding up from statistical recognition of phonemes and words, feeding forward to later phrases being parsed. This would require either two teams, or one team with expertise in both areas.
Cod am pizza ship (Score:3, Funny)

by Trogre ( 513942 ) writes: on Monday May 03, 2010 @06:01PM (#32077996) Homepage

Obligatory UF [userfriendly.org]

Share
twitter facebook
Rest in Peas (Score:3, Interesting)

by CODiNE ( 27417 ) writes: on Monday May 03, 2010 @06:05PM (#32078058) Homepage

I know it's just an imaginary example of how bad text-to-speech is... but it is realistic and disappointing.
Even an idiot like me knows what Markov chain [wikipedia.org] is. Perhaps the standard voice apps are so entrenched they're not recoding their apps to take advantage of huge leaps in memory capacity compared to when they first started selling.

Share
twitter facebook
Do other languages fare just as bad... (Score:5, Insightful)

by thewils ( 463314 ) writes: on Monday May 03, 2010 @06:49PM (#32078680) Journal

English, I would think is a pretty daunting language for speech recognition, what with a substantial array of homophones, but I wonder if other languages fare better. Maybe Spanish or, say, Japanese would be better since (I'm guessing) there is a closer relation to the written script and the actual sound that it makes.

Share
twitter facebook
Philosophers, "we told you so". (Score:3, Insightful)

by cenc ( 1310167 ) writes: on Monday May 03, 2010 @07:06PM (#32078944) Homepage

I have been flamed more than a few times around here for suggesting Computer Science has not got a clue what they are doing when it comes to AI. Philosophy has been at this problem and more for the better part of the last 400+ years (more like a 1,000 years) in a serious way. The stock b.s., I get from the science fiction fan boys is that somehow natural language is a problem that can just be brute forced as if you were trying to figure out the password you forgot to your email account. Good luck with that.
By the way, language "recognition" by a computer is likly the easy part of the problem for AI researchers to crack. It is still not going to yield any real AI, just better cars and toasters.

Share
twitter facebook
Speach recognition tech is broken in many ways (Score:5, Informative)

by Theovon ( 109752 ) writes: on Monday May 03, 2010 @08:33PM (#32079936)

When I started on my Ph.D., I started out majoring in AI. One of several reasons I changed to computer architecture (CPU design, etc.) is because I just couldn't stand the broken ways that people were doing stuff. Actually computer vision stuff isn't so bad -- at least there's room for advancement. But the speech recognition state of the art is just awful. I couldn't stand the way they did much of anything in pursuit of human language understanding.
With automatic speech recognition (ASR), the first problem is the MFCCs. (Mel-frequency cepstral coefficients.) What they essentially do is take a fourier transform of a fourier transform of the data. This filters out not only amplitude but also frequency, leaving you only with the relative pattern of frequency. Think of this as analogous to taking a second derivative, where all you get is accelerating, leaving out position and velocity. You lose a LOT of information. Then once the MFCC's are computed, they're divided up into the top 13 (or so) dominant MFCCs, plus the first and second step-wise derivatives, giving you a 39D vector. Then the top N most common ones are tallied, and code-booked, mapping the rest to the nearest codes, leaving you with a relatively small number of codes (maybe a few hundred).
So to start with, the signal processing is half deaf, throwing away most of the information. I get why they do it, because it's speaker independent, but you completely lose some VERY valuable information, like prosodic stress, which would be very useful to help with word segmentation. Instead, they try to guess it from statistical models.
Next, they apply a hidden Markov model (HMM). Instead of inferring phones from the signal, the way they model it is as a sequence of hidden states (the phones) that cause the observations (the codes). This statistical model seems kinda backwards, although it works quite well, when trained properly. To train it, you need a lot of labeled data, where people have taken lots of speech recordings and manually labeled the phonetic segments. What is usually learned is mostly a unigram, where what you know are the a priori probabilities of each phone label (the hidden states), and the posterior probability of each phone given each possible prior phone. Given a sequence of codes, you find the most likely sequence of phones by computing the viterbi path through the HMM.
Honestly, I can't complain too much about the HMM. What I do complain about is the fact that the "cutting edge" is to replace the HMM with a markov random field (just remove the arrows from the HMM), and conditional random fields (which are markov random fields with extra inputs).
My response to using MRFs and CRFs is "big whoop", because all you're doing is replacing the statistical model, which doesn't dramatically improve recognition performance, because they haven't fixed the underlying problem with the signal processing.
Then on top of the phone HMM, they layer ANOTHER HMM on top of it to infer words and word boundaries, based on a highly inaccurate phone sequence.
The main problem with all of this is not that the reseachers are idiots. They're not. The problem is that the people with the funding are totally unwilling to fund anything really interesting or revolutionary. The existing methods "work", so the funding sources figure that we can just make incremental changes to existing technologies. Which is wrong. Unfortunately, any radically new technology would be highly experimental, with a high risk of failure, and would take a long time to develop. No one wants to fund anything that iffy. As a result, all the scientists working in this are spend their time on nothing but boring tweaks of a broken but "proven" reasonably effective technology.
So I don't blame people for the conundrum, but I see no opportunity to do anything interesting, so I just couldn't stand studying it.

Share
twitter facebook
screw speech recognition (Score:3, Funny)

by smash ( 1351 ) writes: on Monday May 03, 2010 @09:33PM (#32080446) Homepage Journal

Its just a speed bump on the way to thought recognition, which will be far more useful.

Share
twitter facebook
- Re: (Score:2)
  
  by Zancarius ( 414244 ) writes:
  
  I certainly hope that TFA title is intentional...
  Considering the subject matter, I'd hope readers would be able to detect a play on words when they see one.
  Nevertheless, it got your attention, didn't it?
- Re: (Score:2)
  
  by CohibaVancouver ( 864662 ) writes:
  
  Years ago I used viavoice on Warp4, and it had a pretty decend [sic] recognitation [sic] rate ..
  Did you have to 'train' it to your voice using a script and a series of corrections, or did it have 'natural' speech recognition from the get-go, the way you do when you chat with a cashier at the supermarket?
  - Re:What are you talk'in about ? (Score:4, Insightful)
    
    by bmo ( 77928 ) writes: on Monday May 03, 2010 @05:53PM (#32077872)
    
    People want "human quality" speech recognition.
    As if we're ever going to get away from training speech recognition programs when we train listeners every day when we speak. It's just that most people don't look at it as being trained, since we're so used to doing it.
    I'm sure you have more trouble understanding someone with a thick Cockney or Scottish accent if you're from the Midwest US. You'd ask that person to repeat a few times, wouldn't you?
    To expect speech recognition programs to *not* use training is to expect them to exceed human intelligence. Indeed, it's to expect such programs to be psychic.
    --
    BMO
    
    Parent Share
    twitter facebook
    - - Re: (Score:3, Interesting)
        
        by bmo ( 77928 ) writes:
        
        Only you talk like you. There is no archive of speech large enough to encompass every speaker of a language except one that has a record of each and every speaker. And it still doesn't solve the teaching problem. The shotgun approach is problematic in many ways, most of all the size of the database and you'd still wind up teaching the speech platform to find what accent you're using, because if you ask most people, they don't have any accents at all.
        Actually, I think the solution would be to make personal
        
        Re: (Score:3, Insightful)
        
        by icebraining ( 1313345 ) writes:
        
        No, I won't to use a common dataset to train all software automatically, like VoxForge. What I was saying is that people don't need training to talk to each person they meet. A generic background training works fine, and so it should for computers.
- Re:What are you talk'in about ? (Score:5, Funny)
  
  by corbettw ( 214229 ) writes: on Monday May 03, 2010 @05:29PM (#32077482) Journal
  
  Years ago I used viavoice on Warp4, and it had a pretty decend recognitation rate ..
  Looks like whatever you're using now ain't quite as good.
  
  Parent Share
  twitter facebook
- Re: (Score:2, Interesting)
  
  by Ethanol-fueled ( 1125189 ) * writes:
  
  When talking to someone else, we can politely stop them and ask : "Sorry, what did you say?"
  That dosen't always work. When accents and the command of a language are so poor, you only get a few chances to ask, "Sorry, what did you say?" After asking three times, you either look like an asshole and/or give up and spend the next few minutes nodding and smiling before trying to parse what they said, hoping you get it right.
  
  Which is why we need good speech-recognition and translation software. It's easy to
- Re: (Score:2)
  
  by Jeng ( 926980 ) writes:
  
  It would seem that people learn computers better than computers learn people.
  Much like talking to someone with a poor grasp on ones language you try to make things simple and easy to understand.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Buffalo buffalo (Score:5, Insightful)

Mod parent up (Score:3, Informative)

Focus, Dammit. (Score:2)

Re: (Score:2)

Re:Focus, Dammit. (Score:5, Funny)

Re:Mod parent up (Score:5, Interesting)

Re:Mod parent up (Score:4, Insightful)

Re:Mod parent up (Score:4, Funny)

Re:Mod parent up (Score:4, Insightful)

Re: (Score:3)

Re:Mod parent up (Score:5, Interesting)

Re: (Score:3, Insightful)

Re:Mod parent up (Score:5, Informative)

Re: (Score:3, Funny)

Re:Buffalo buffalo (Score:5, Funny)

Re:Buffalo buffalo (Score:5, Insightful)

Re:Buffalo buffalo (Score:5, Insightful)

Badger badgers badger Badger badgers (Score:2)

Re:Badger badgers badger Badger badgers (Score:5, Funny)

Re:Buffalo buffalo (Score:5, Informative)

Re:Buffalo buffalo (Score:5, Informative)

Re: (Score:3, Interesting)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Key words (Score:3, Interesting)

Re:Key words (Score:5, Funny)

Re:Key words (Score:4, Funny)

Android Speech Recognition Rules (Score:5, Informative)

Comment removed (Score:5, Funny)

Re:Android Speech Recognition Rules (Score:5, Funny)

Re: (Score:3, Interesting)

Re:Android Speech Recognition Rules (Score:5, Funny)

Re: (Score:3, Funny)

Re: (Score:3, Insightful)

Let me guess (Score:5, Funny)

Re: (Score:3, Funny)

AI (Score:5, Insightful)

Re:AI (Score:4, Insightful)

Re: (Score:3, Interesting)

Re: (Score:3, Informative)

That's Because... (Score:5, Funny)

Well duh. (Score:4, Funny)

Re: (Score:2)

Re: (Score:2, Funny)

Time flies (Score:2)

Re:Well duh. (Score:5, Funny)

Re:Well duh. (Score:5, Interesting)

Re: (Score:2)

Crap, crap, crap into the toilet bowl (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Number of sentences? (Score:2, Insightful)

Windows 7 (Score:3, Interesting)

Re:Windows 7 (Score:4, Informative)

Not Dead Yet (Score:2, Insightful)

Totally Not Dead Yet (Score:5, Interesting)

World model (Score:2, Informative)

Time flies like an arrow fruit flies like a banana (Score:3, Insightful)

Re: (Score:2)

Training (Score:2)

Speech recognition is higher intelligence (Score:2)

Since I don't have a flying car today, all is lost (Score:5, Insightful)

Sssssh. (Score:2)

is there any evidence for this analysis? (Score:4, Insightful)

No it doesn't (Score:3, Interesting)

Blame startrek (Score:5, Insightful)

yale-in-ox-boom-i-crows-off (Score:2)

IBM? (Score:2, Funny)

Re: (Score:2)

Re:IBM? (Score:5, Interesting)

Tea, Earl Grey, Hot (Score:5, Funny)

Re: (Score:3, Interesting)

Re:Tea, Earl Grey, Hot (Score:4, Funny)

Shout-outs to two idiots (Score:5, Insightful)

no, it doesn't work on cell phones, either (Score:2)

Data Input (Score:2)

Dear Aunt, (Score:2)

Forget speech recognition.... (Score:3, Funny)