Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×

Is Speech Recognition Finally 'Good Enough'? 313

jcatcw writes "Speech recognition software is fast, but it still may not be accurate enough. Clerical jobs usually ask for 40 wpm, but speech recognition software can keep up with someone speaking at 160 wpm. In Lamont Wood's demo it did very well at too/two/to and which/witch, but will it still render 'I really admire your analysis' as "I really admire urinalysis'? At 95% accuracy, people aren't jumping on the bandwagon. Wood's typing speed is about 60 wpm with 93% accuracy, so he found that using speech recognition was about twice as fast as typing. Those who type at hunt-and-peck speeds will experience results that are even more dramatic. There's really only one product on the US market: Dragon NaturallySpeaking from Nuance Communications. The free versions from Microsoft aren't up to the task and IBM sold ViaVoice to Nuance, where it's treated as an entry-level product."
This discussion has been archived. No new comments can be posted.

Is Speech Recognition Finally 'Good Enough'?

Comments Filter:
  • Hmmm.... (Score:5, Funny)

    by DoofusOfDeath ( 636671 ) on Friday May 18, 2007 @05:10PM (#19184747)

    Is Speech Recognition Finally 'Good Enough'?

    Is spinachry ignition rivaly gooery stuff? What the hell are you talking about?

    • Re:Hmmm.... (Score:4, Funny)

      by value_added ( 719364 ) on Friday May 18, 2007 @05:16PM (#19184857)
      What the hell are you talking about?

      Maybe he meant speech wreck ignition?
      • wreck a nice beach??
      • You're joking but... (Score:4, Informative)

        by thepotoo ( 829391 ) <thepotoospam@yah[ ]com ['oo.' in gap]> on Saturday May 19, 2007 @12:26AM (#19188189)
        You hit the nail on the head with that one. My sister uses Dragon Speak Naturally exclusively (she's dyslexic and can't type or read worth crap, so she has to use Dragon Speak Naturally and Kurzweil (screen reader).

        Dragon requires MONTHS of training (literally), and even then it will make mistakes exactly like the one you noted. The plus side is that Dragon works pretty decently under WINE, but apart from their Linux "support", it's a complete mess.

        Screen readers aren't much better; they have the accuracy, but are hard to understand.
        For a little geeky fun, I had Kurzweil read a few English papers to Dragon. Even after some training, Dragon still couldn't get above 80% accuracy on a computer generated, 100% reliable, voice. Now that's just sad.

    • by parvenu74 ( 310712 ) on Friday May 18, 2007 @05:18PM (#19184881)
      I used to work for a company that has the words "new directions" in their name. When I told people where I worked I would make a rather long pause between the "new" and "directions" so as not to sound like I was saying something else. I wonder how this software would render it...
      • by sd_diamond ( 839492 ) on Friday May 18, 2007 @05:38PM (#19185169) Homepage

        I used to work for a company that has the words "new directions" in their name.

        Please tell me the first two words in the name weren't "Coming From".

      • by TrippTDF ( 513419 ) <{moc.liamg} {ta} {dnalih}> on Friday May 18, 2007 @05:50PM (#19185341)
        Reminds me of when the company "Pen Island" or "Mole Station Nursery" set up their domain names...
        • Re: (Score:3, Funny)

          by Anonymous Coward
          And let's not forget the Italian energy company Powergen Italia... their name makes for a wonderful .com address!
      • by account_deleted ( 4530225 ) on Friday May 18, 2007 @05:56PM (#19185385)
        Comment removed based on user account deletion
        • Any speech recognition software worth the $ should be able to detect and translate NATO letter names [wikipedia.org]: "hotel tango tango papá colon slash slash sierra leema alpha sierra hotel delta oscar tango dot org".

        • Re: (Score:3, Funny)

          by Instine ( 963303 )
          People want machines to be better than people. They still have this 'infalibility' hang-up. That a machine is more determanistic, and thereby, is either right or wrong. I'm not stupid, but for a bit, when people said "/. was worth looking at" in blogsor whereever, I actually wondered how I'd find it. Then when I finally heard someone say "slash dot" I kept trying URLs with hyphens. Not for long, and clearly I've found it. But for weeks I was intregued by /. but couldn't figure out where to look (you can't g
      • Re: (Score:2, Funny)

        by Champ ( 91601 )
        Call me immature but I still get a mild chuckle out of, er, expert-sex-change-dot-com and part-sex-press-dot-com. Wait, what's the other part?
      • It's a coffee company, but it sounds like he's peddling homoerotic publications. *shudder*
      • We were testing an edition of Dragon Naturally Speaking back in 2000, when an Asian-American woman on our team took the microphone. She had a heavy accent, and the software interpreted her words as... nothing.

        She stood there, trying to get it to write something, and finally ended up repeating, "It not woking! Why it is not woking?"

        We were afraid to laugh, fearing a trip to HR... we all stood there, biting the insides of our cheeks, until she gave up and left the room; then, we collapsed on the floor, li
    • Is Speech Recognition Finally 'Good Enough'?

      Funny, when I dictated this sentence to my computer today, it came out "Is Slashdot's Shameless Plug Recognition Finally 'Good Enough'?"

      Today somebody at Dragon got moved to a corner office.

      • Re: (Score:3, Insightful)

        by bearinboots ( 743355 )

        Dragon is no more... and hasn't been for a long time.

        NaturallySpeaking has been sold a few times to various companies.

        (I keep track because I worked on V1.0)

    • Re: (Score:2, Insightful)

      I'll be honest with you, Vista is way better at coming up with hilarious new Madlibs than you are.
    • Several years ago, I saw a court reporter using a speech recognition system with his laptop. The microphone actually looked like some sort of breathing apparatus, as it fit snugly over his mouth and nose with the wires in a tube running down to the laptop.
    • Is Speech Recognition Finally 'Good Enough'?

      Is spinachry ignition rivaly gooery stuff? What the hell are you talking about?

      That's a great one! Here are a few of *MY* favorites:

      1. Nature'll anguish; wreck ignition.
      2. Our feet are stayin'.
      3. A river and a ditch.
      4. Mercy buckets.
      5. Bone chewer.

      The translations:

      1. natural language recognition
      2. Auf wiedersehen (German: "good-bye")
      3. arrivederci (Italian: "good-bye")
      4. merci beaucoup (French: "Tahnk-you very much")
      5. bonjour (French: "hello")

      These are all I can rem

    • The classic example used in my AI class to describe the problem of getting a computer AI to... recognize speech.

      Though to me the problem with dictating text (the obvious use for speech recognition) is the need for some kind of escape for punctuation or program control. I mean, you can't just say "I went to the store period select all cut" because even assuming it recognizes all the words perfectly it wouldn't know if the "period" is supposed to be a word or punctuation, and either way it "assumes" you'd ne
  • Problems (Score:5, Insightful)

    by Tribbin ( 565963 ) on Friday May 18, 2007 @05:12PM (#19184773) Homepage
    As a foreigner it is really hard to get the pronounciation right enough.

    Also command execution by others in the room is a problem.

    How about listening to music, or TV, and having the computer interpreting it.
    • Re:Problems (Score:5, Informative)

      by Sciros ( 986030 ) on Friday May 18, 2007 @05:19PM (#19184887) Journal
      It all depends what sort of corpus the SR system is trained on. So yeah, foreigners will have problems because a system trained for, say, British English will not perform well with American English. For this same reason an SR system trained for "normal" speech will do very poorly with lyrics in music.

      As for stuff like "i really admire your analysis" being interpreted as "i really admire urinalysis," that stuff can easily be ironed out by an n-gram based system that "ranks" English sentences based on probability. What is the chance that "urinalysis" will follow "your" which follows "admire"? Such things can be estimated well enough if you use a large corpus to train your n-gram system (as long as the corpus you're using for this is the same "kind" as whatever speech the SR system is interpreting -- that is, newswire, business meeting, etc.)
      • by Sciros ( 986030 )
        By the way, what I described is referred to as the "Language Model" component of a natural language processing system. I'm sure Nuance uses one, so whatever errors it makes are probably from a result of data sparseness during training.
      • also helps if you don't pronounce it "yer 'nalysis" (i know, i live in OKRAHOMA)
      • As for stuff like "i really admire your analysis" being interpreted as "i really admire urinalysis," that stuff can easily be ironed out by an n-gram based system

        Please tell me you're not talking about engrams in Dianetics [wikipedia.org].

        that "ranks" English sentences based on probability.

        That, or just have speakers adopt the German habit of pronouncing a glottal stop before words that start with a vowel. (A glottal stop is the sound in the middle of "uh-oh".) The test phrase would sound like this [jk0.org], and such a habit would help to disambiguate "your analysis" from "urinalysis" even for a medical transcriptionist, whose language model may have been overtrained with "urinalysis".

    • by lawpoop ( 604919 )

      Also command execution by others in the room is a problem.

      How about listening to music, or TV, and having the computer interpreting it.
      I think a noise canceling microphone would take care of those problems.
  • Dear aunt, let's set so double the killer delete select all.
  • Sure (Score:3, Funny)

    by springbox ( 853816 ) on Friday May 18, 2007 @05:13PM (#19184789)
    In fact, I'm using it to write this Dear aunt, let's set so double the killer delete select all
  • by orclevegam ( 940336 ) on Friday May 18, 2007 @05:13PM (#19184791) Journal

    Is Speech Recognition Finally 'Good Enough'?

    For typing up an inter-office memo in Word, most likely. But I'm a programmer, and I can barely read out loud some perfectly fine code, I can't imagine trying to enter it all with voice recognition, no matter how good it gets.

    • by Mahjub Sa'aden ( 1100387 ) <msaaden@gmail.com> on Friday May 18, 2007 @05:31PM (#19185079)
      Instead of asking if speech recognition is "good enough", maybe we should be asking whether or not it's actually useful for anything in the first place. I mean, is it good enough... to do what?

      Can you imagine being in a cubicle farm full of people talking to their computers? Or trying to talk to your computer on the bus? You have to imagine that as computers become more ubiquitous, input methods will have to adjust alongside, and I simply can't see (or hear) speech recognition doing that very well.
      • Re: (Score:3, Insightful)

        Speech recognition is obviously not universally usable, but it is useful. I've found that for many mundane tasks, the OS X speech recognition is easier than a keyboard shortcut, and much easier than using the mouse. There are a lot of applications that could be much easier if they included speech recognition for commands. Consider an app that relies heavily on both keyboard and mouse input, such as Blender. A lot of the keyboard shortcuts would be faster and easier to remember as spoken commands, and they c
      • Re: (Score:3, Insightful)

        by babblefrog ( 1013127 )
        Where I see it coming into its own is as an input method for really portable "wearable computing", where it would be extremely inconvenient to use a keyboard.
      • Re: (Score:3, Insightful)

        by AJWM ( 19027 )
        I mean, is it good enough... to do what?

        Oh, how about evesdrop on a few thousand voice circuits and raise a flag when certain key words or phrases are mentioned?
    • Re: (Score:2, Insightful)

      programming with voice recognition just seems stupid to me. The idea behind voice recognition is to make it easier to write natural speech, such as email, or an essay, or anything else that follows normal speech patterns. Programming is writing so a computer can understand what you want it to do. It involves TONS of punctuation, oddly named keywords and variables (var, int, _InitBlockPosX). Hell, I can barely read my code aloud to someone else without confusing MYSELF, much less confusing the other huma
    • For typing up an inter-office memo in Word, most likely. But I'm a programmer, and I can barely read out loud some perfectly fine code, I can't imagine trying to enter it all with voice recognition, no matter how good it gets.

      Probably because computer languages aren't designed for dictation. It would be interesting, however, if a language were designed for spoken programming rather than typing. What would that look like -- errr, sound like? Code-reviews might get a little wacky though (I'm hearing voices in the computer!).

    • Why would you want to? I spend more time thinking about it than typing it anyway. It's not like speech, where you don't think about the words. I'm sure I'd hate being like "def getstr... no, getvaria... erm, gettype".
    • I don't mind the errors, what I do mind is taking my time out to correct them.

      While tying, if I make a typo or something - I either ignore the few wrong letters, correct them really fast (takes a second or two), or the spell checker does it for me. All in all, I am still concentrating on what I was doing.

      I have tried Dragon Naturally Speaking ver 5, 7, and the latest one, 9sp1. It really has gotten better throughout the generations but when I dictate a document and something comes out bad - it's an entire
  • by account_deleted ( 4530225 ) on Friday May 18, 2007 @05:13PM (#19184795)
    Comment removed based on user account deletion
    • Speech recognition, handwriting recognition, species recognition... all of these suck, and will CONTINUE to suck, until strong AI is developed.

      I dunno... I mean, the Newton with six months of training had around 98% accuracy... Inkwell's based of the same algorithms, albeit tweaked slightly to accomadate from the difference input peripherals. I bet with a year of real research/development, Apple could take handwriting recognition off that list.
  • by ral315 ( 741081 ) on Friday May 18, 2007 @05:13PM (#19184797)
    I use it myself. It's wonder full. delete that. delete that. delete that. double the killer delete select all
  • "Set v underscore tab equals space parenthesis parenthesis x minus lev schema dot all recs concatenate..."
    • by Tackhead ( 54550 ) on Friday May 18, 2007 @05:26PM (#19185019)
      > "Set v underscore tab equals space parenthesis parenthesis x minus lev schema dot all recs concatenate..."

      Yeah, but if you put a beat to it, you've got something.

      { } . ! /
      & ; ^ # -
      < > @ \
      { } _ SYSTEM HALTED

      "Left titty, right titty, dot bang slash.
      Ampersand semicolon, caret pound dash.
      Less than greater than, at back slash,
      left titty, right titty, under score crash!"

      * # ! ! (
      ~ & | )
      ' " . . DEL
      # ^G ! ! working... done.

      "Star pound bang bang, open-paren.
      Tilde and pipe, close-paren.
      One quote, two quote, dot dot delete,
      pound bell, bang bang, process complete!"

      Google's USENET archive dates it back to 1990, but it predates the 1990 post ("Stuck Shift Key Poetry") to rec.humor.funny by several years.

      You haven't lived until you've seen a dozen drunken geeks trying to sing "Waka Waka", or the entirety of "Hatless Atlas", while seeing only one character at a time. Well, maybe you have, but this is Slashdot.

  • With some of the stuff that I see on the Internet (websites and blogs etc.) I'd have to say that the urinalysis gaff isn't really all that bad.

    The only place that speech recognition really annoys me is phone answering systems. They are not competent enough to let you concatenate menu item options and make an intelligent choice as to which phone queue to put you in. For example:

    "I have trouble with my cable modem dropping packets" is a statement that 'SHOULD' get you put through to the second tier support li
    • Re: (Score:3, Insightful)

      by RingDev ( 879105 )
      To be fair, that's a problem with the IVR coder, not the voice recognition engine.

      -Rick
    • by poptones ( 653660 ) on Friday May 18, 2007 @05:26PM (#19185017) Journal
      Press or say one to speak with a representative in english...

      One

      When you hear the option you are calling about you may say it at any time. If you are calling about a billing problem, say billing. If you are calling about a technical issue, say technical. If you are calling about new service, say new customer. If you are...

      Billing

      I'm sorry, that is not an option. When you hear the option you are calling about you may say it at any time. If you are calling about a billing problem, say billing. If you are calling about a technical issue, say technical. If you are calling about new...

      Billing!

      I'm sorry, that is not an option. When you hear the option...

      Billing billing billing!

      I'm sorry, that is not an option. When you...

      Fuck you! Give me a human! Human human human!

      I'm sorry, that is not an option. When you hear the option...
      • Unless I call a number where I expect an automated system, the first thing I do is press and hold the 0 button for about 10 seconds.

        I'm usually talking to a real person within a minute or so.
    • sonme jackass tells non tech people to sue it to get tier 2 help.

      Probably the same jackass that told people about the Internet.
  • For those of us with serious RSI and who program/sys admin for a living, are there any serious attempts at voice recognition out there? Specifically, have there been any breakthroughs with speech -> symbol names or obscure shell commands?
  • by traindirector ( 1001483 ) * on Friday May 18, 2007 @05:15PM (#19184841)

    TFA mentions that many people stop using speech recognition software because of poor accuracy. I don't think that's the major reason. I think they start using it because it's a neat idea that seems to have a lot of promise, but quickly realize there are only a few situations where it's actually helpful. The end of the article mentions rough drafts; I'd also say it might be a decent choice

    • when you need to enter hand-written documents into a computer
    • for transcripts of a single speaker
    • informal free-thought when not surrounded by other people
    • when you have horrible typing skills

    For the majority of office tasks, it just isn't a good fit.

    So if the "good enough" is being useful in any way whatsoever, it sounds like we're almost there.

    • by L. VeGas ( 580015 ) on Friday May 18, 2007 @05:20PM (#19184909) Homepage Journal
      These are some good points. I don't know what I would use speech recognition for, and I'm someone that writes a lot.

      Seeing words laid out as text helps me think. I can compose things better, more coherently.

      I'll write an email in an instant, but make me leave a voice mail, and I'll usually hang up first.
    • Re: (Score:3, Insightful)

      by RingDev ( 879105 )
      I would love it for a graphics editor. Being able to swap tools, zoom, bring up pallets, etc... with out having to go digging through menus or trying to remember hot keys. I think VR in desktop software has a place, but it is in augmentation, a fringe benefit, not the core functionality.

      -Rick
      • Limited-vocabulary speech recognition has been working -- without training -- for years now. Suitable engines are built-in to OS X and MS Office, and there are several choices for Linux as well. Not all tools provide good programmatic access, so it may or may not be easy to integrate with your favorite tools, but the actual speech recognition part is there.
    • Mod parent up! (Score:4, Insightful)

      by Doctor Memory ( 6336 ) on Friday May 18, 2007 @05:43PM (#19185243)
      Seriously, the only things speech recognition is good for are bulk text entry and simple navigation. I imagine trying to use voice commands to operate modern software would be similar to letting my four-year-old help make pancakes — yes, it gets done, but it's so much easier and faster to just do it yourself. Imagine trying to edit a document using just voice commands. Is your WP going to be smart enough you can tell it "find all occurrences of 'scum-sucking bottom feeders' and replace it with 'esteemed colleagues'". Or are you going to have to say "Find. Scum hyphen sucking bottom feeders. Tab. Esteemed colleagues. Replace all." Face it, GUIs have rendered speech recognition for command and navigation moot. Most operations you perform don't have a verbal description, or at least not one that is quicker to say than to do.

      I also can't imagine it'd be that useful for actually writing things. I don't think I'm the only one who revises as they write. I think I actually write better when I write things out by hand, because it's slower so I tend to think my phrasing and sentence structure through more before I commit anything to paper. If I could suddenly type two or three times faster, I think it'd probably make my text even more incomprehensible than it usually is...
      • Re: (Score:3, Interesting)

        * when you need to enter hand-written documents into a computer
        * for transcripts of a single speaker
        * informal free-thought when not surrounded by other people
        * when you have horrible typing skills

        You had me at "* when you have horrible typing skills".

        Parent post mentions their 4 year old making pancakes.
        At some point, most likely, you expect the kid is going to grow up and get better at making pancakes. There will be
    • Re: (Score:3, Interesting)

      Excellent points. One only need consider how much computer usage is done in cubicle farms, and then picture everyone chattering "Scratch that!" at their workstation, and the utility of speech recognition as a primary form of input becomes very limited regardless of its accuracy. I have a copy of Dragon, and its accuracy is really quite impressive, but past the novelty I have almost never used it. Other than the fact that it requires virtual silence (aside from your voice) to operate, unless I already kno
    • by nine-times ( 778537 ) <nine.times@gmail.com> on Friday May 18, 2007 @06:50PM (#19185961) Homepage

      informal free-thought when not surrounded by other people

      I think you're implying something here that is one of the major reasons people don't use speech recognition software: if anyone is around, you feel like a total moron.

      You might not realize this, but you probably speak differently than you write. Most of us do, because there are some things that look good in text that sound bad spoken, and vice versa. Also, a lot of composition goes on when writing, and so if you're playing with different word choices so you can see them written out, you just end up sputtering dumb little phrases. It's easier to edit on-the-fly when using a keyboard. And let's not forget that you might not want the people around you to know what you're writing.

  • I'm using Dragon NaturallySpeaking. Right now, as I write this calm it, comet, post, and it sure as hacking beats typing.

    Actually, I am using Dragon NaturallySpeaking right now, and it works very well. It actually works better if you speak quickly (as you normally would) and it's pretty good at inserting grammar along the way. I have bilateral tendinitis, and the software has been a godsend for me. I was even able to finish writing my book, a task that was becoming just too painful typing manually.

    Oh, and you are probably wondering how long it takes to train the software? About a half an hour, and I find the accuracy at around 95%.
    • Re: (Score:3, Informative)

      by Sciros ( 986030 )
      Yeah, Nuance makes good stuff. Well, they've bought up everyone worth anything afaik, so I guess it's only to be expected.
    • Actually, I am using Dragon NaturallySpeaking right now, and it works very well. It actually works better if you speak quickly (as you normally would) and it's pretty good at inserting grammar along the way.


      What does "inserting grammar" mean?
      • What does "inserting grammar" mean?

        It means adding commas and periods as you speak to make the text read more natural.
  • I work on IVR systems for clinical research and medical screening (along with a huge variety of other things we make these systems do). And it's pretty good. We do a lot of work massaging the Grammars to make the system more accurate though, and we have a lot of extra logic built in for situations where we can predict values and assign weights to different words. But the one thing that rather annoys me is that I quite often have issues with Skype's quality just being a bit to low for the system to pull off.
  • Pretty good (Score:5, Funny)

    by Richard McBeef ( 1092673 ) on Friday May 18, 2007 @05:21PM (#19184917)
    95 percent is pretty good, only one word in twenty. I wouldn't have a problem with a 5% error ate.
    • Re: (Score:3, Insightful)

      by Rei ( 128717 )
      5% could be the difference between "The report confirmed that Iraq has WMDs" and "The report confirmed that Iraq had WMDs." It could be the difference between "Tell Mrs. Smith to take 20mg of neurontin" and "Tell Mrs. Smidt to take 20mg of neurontin." It could be the difference between "The magnet should not be exposed to a field greater than fifteen teslas" and "The magnet should not be exposed to a field greater than fifty teslas." And on, and on.

      Small wording changes can make a big difference -- gener
      • That is precisely the problem with Dragon - the algorithm by its very nature will not create typos - it is matching speech against known words. So it is helpless with new vocab (although you can train it) and it makes for devilish subtle typos that take longer to pick out than it would have to run a spell check after a typist finished with their 93% accuracy.
        • Re: (Score:3, Insightful)

          by TheRaven64 ( 641858 )
          I wonder exactly what 95% means. Does it mean one character out of every 20 is wrong? One word out of every 20 has an error? One sentence. I average about one to two errors per page, and so all of these sound horrendous to me. Even typing with my eyes closed (which I do sometimes when my eyes are feeling tired, but generally don't because I always think I've managed to move my fingers one character across and started typing complete nonsense) I get higher accuracy than that.
  • We use it. (Score:3, Interesting)

    by Organic Brain Damage ( 863655 ) on Friday May 18, 2007 @05:21PM (#19184923)
    For command control of a system where we need both hands free. It's pretty good, much better than stopping and typing, clicking or pressing buttons during a repetitive manual process.

    We're using an older version of Microsoft's product and it seems the microphone quality is important.
    • by cs02rm0 ( 654673 )
      Likewise, for our main product we've integrated Dragon for command and control. It's faultless there, even without training. It's 'good' in general use, but that doesn't really cut it for anyone who can touch type.
  • For some reason even time this topic comes up the focus seems to shift word-processor type use.

    What about simpler uses? How many basic tasks in the car require you to take your hands off the steering wheel? I'd like to see the basic functionality of the remote control mirrored in speech recognition. Things like stop/pause/increase/skip.

    I'd imagine once this kind of simple recognition became common over-all speech recognition would (more) rapidly evolve.

    • What about simpler uses? How many basic tasks in the car require you to take your hands off the steering wheel?

      Zero.

      A very few may require taking a hand off the steering wheel, though well designed newer cars tend to solve even that by putting controls on the wheel.

      Though to solve the major "hands off the wheel" problem I've seen in other drivers, I'm not sure how voice control would work, anyhow: are you proposing a voice-controlled makeup application system?

  • ...so everyone will talk all the time, half the work population will go postal and the other half will get offices. Also one thing that I notice is that I rarely get everything right the first time, I go back to add a sentence or use copy-paste quite a bit. It's really much easier to do that with your fingers without losing the "verbal" line of thought. And all the applications where it makes much more sense to use the UI than trying to talk your way through commands, voice commands get a bit like the ocmma
  • by __aajwxe560 ( 779189 ) on Friday May 18, 2007 @05:39PM (#19185183)
    I am presently a financial customer of an enterprise speech recognition product that Nuance offers. For several years now, the speech recognition software industry has been under consolidation, with Nuance buying a few different competitors and technologies. Most recently, this dance has continued with Nuance being acquired by ScanSoft, a company known for specializing in type recognition.

    Nuance support is marginal at best, and through all the consolidations, understanding even within their own company of how the product works is quite lacking. We have found our own developers often times educating the Nuance support folks in various aspects of how the product is working, and then inquiring as to whether this is intended behavior or not. Crickets can often be heard finishing these types of conversations. We normally would have moved to another product under these conditions, but simply put - Nuance acquired what little was left, and now has no competition in the market. Competition is what spurs innovation, and so with the continued consolidation, it is hard to see significant advances in the technology without free help from academia.

    If you think the Microsoft monopoly is bad, imagine if they absorbed Apple and somehow took over Linux leaving you with a few "choices", but all under the Microsoft moniker. The technology is very neat and the enterprise level products do some basic things quite well, but there is still some glaring room for innovation that I don't expect anytime soon under present industry conditions.
  • This is really apples & oranges. The typist with 93% accuracy will produce a document with some typos, and I can tell you from years of reading /. that typos are easily "corrected" by the reader if the typist doesn't catch them. Even at that, spell checkers catch quite a few of them, too.

    That's very different from "your analysis" turning into "urinalysis". Here, the spelling is correct but the words are completely wrong, and trying to figure out what is really meant will take a much longer reading of
  • About 5 years ago some manufacturer announced chips for under $5 that would do speaker-independent, limited vocabulary recognition and I predicted that there would be products appearing all over the place that would get rid of the crappy buttons and use speech as the interface. The only place I see it is in cell phones, and I always turn it off, because I don't want my cell phone surreptitiously calling someone while I am talking ABOUT them. Anyway, why hasn't the toy and gadget market latched onto speech
  • I use the speech recognition on my BlackBerry Perl^H^Hrl^H^Harl all the time and it's "good enough".
  • The holy grail for me has been software that deals with speech impediments. I stutter. I'm fluent enough that I function fairly well in real life (I'm a high school teacher), but speech recognition software has universally failed to meet my needs.

    All of the words have too many s's.

  • I'm using Dragon NaturallySpeaking 9 right now. I've been using it for several months, and I have written [softduit.com] a dozen articles on it. I think it works fantastic, but you definitely have to learn how to write all over again. Out of the box it trains extremely quickly, if you do not want to train it at all you can just start talking and it will eventually catch up with you. (Note it caught catch up and not ketchup) I started using it as a preventative means of avoiding repetitive stress injuries. I cannot u
  • Zeno's Translator (Score:4, Informative)

    by Carcass666 ( 539381 ) on Friday May 18, 2007 @08:28PM (#19186801)

    Speech recognition has been at a standstill for years now, it's been "almost there now" for well over five years. As mentioned in other posts, there has been a lot of consolidation and that has really hurt growth. Lernout & Hauspie and Dragon were constantly going back and forth a few years ago trying to get a leg up on each other. When L&H got into all of their accounting problems and shut down, that left Dragon and IBM. IBM's product went to Scansoft and went to Nuance where it languishes until somebody pulls the plug (for example, if you call for support on ViaVoice and mention you have XP SP2, they will tell you it is not a supported platform).

    Most of the improvement in the Dragon and ViaVoice over the last couple of years has been in the reduction of training required to get to the high-ninety's level of accuracy (assuming noise-cancelling mic in a quiet room and you do not have a cold/sore-throat). The advancements in training have not corresponded to much in the way of translation accuracy. A "trained" Dragon 7 recognizes speech pretty much as well as Dragon 9 (I haven't played with Dragon 10 yet).

    Most of the real speech recognition advancement these days is focused on discrete word sets for voice mail trees and other interactive systems. When you are on the phone giving your credit card number, two/to/too is all the same thing. While speech recognition in its current incarnation is good for people who can't type (disabilities, carpal-tunnel, etc.) it is not a replacement for typing, and isn't any closer today than it was five years ago.

  • by Pedrito ( 94783 ) on Friday May 18, 2007 @08:46PM (#19186893)
    Speech recognition generally comes in two flavors: Command and Dictation. Most voice recognition engines can handle either, but the implementations are very different. Command mode is handled by providing a list of "command" words that are valid at any given point and operates much like a state machine. Dictation is a completely different beast and does a variety of things under the hood to increase accuracy.

    "Good enough" is very vague as applied to voice recognition. For command stuff, "good enough" has been here for about 7+ years. Even MS's free engine does a great job at that.

    I used Via Voice years ago and it worked pretty well. But here's the thing: Have you ever tried to dictate something? It's definitely a skill. I'm sure some people have a natural ability for it, but I certainly didn't. I tried dictating stuff and it's tough. You hit a pause mid-sentence trying to figure out how you want to phrase something and suddenly there's a period and you're beginning a new sentence. Try dictating several sentences of original material and keeping it going without pauses and "um"s and so forth and you'll see, it's not quite as easy as it seems. I suspect one of the reasons voice recognition hasn't been a hit, is that people don't expect that. They try it for a few days think, "Hell,it's easier just to type," and give up. That's why I don't use it for writing. I can type faster and more accurately than I can dictate. I'm sure if it's something I wanted to work on, I could develop the skill, but my point is, I think that's probably why a lot of people give up on it.

    I honestly think that voice recognition in command mode could be really useful at speeding things up, if software were designed to take advantage of it. But it's not easy to add it as an afterthought and it adds significant work, even if it's done with forethought. It's a chicken and the egg thing. If a lot of software supported it, I think people would see a gain in productivity using whatever software they use daily. I don't mean just using voice recognition, but in combination with a mouse and keyboard. For example: "Execute Browser. google dot com. flying burrito brothers. google search". Saying that would be a pretty fast way of opening your web browser, typing "google.com" and then typing "flying burrito brothers" and then clicking the "Google Search" button. Replace "Google Search" with hitting the enter key and even faster.

    But as I said, it's a chicken and the egg thing. Software doesn't support it because there's no demand and there's no demand because people haven't really experienced software that supports it.

    Another issue (and I'm sure this has been mentioned by others), is background noise. I like to listen to music or watch TV while I work. Those don't mix well with voice recognition, at least not at the volumes I listen to them. Until voice recognition can get around that and recognize my voice amidst background noise and do it accurately AND software out there generally supports it, it's not going to go mainstream.
  • by otomo_1001 ( 22925 ) on Friday May 18, 2007 @09:16PM (#19187101)
    I mean really, until I can say to my computer things like:

    Find all mp3's that were created by Trent Reznor and pipe them to /dev/audio on the neighbors computer. What use will it be?

    I can't program in it can I?

    if(i_can_write_code_I_mean_speak_code_to_the_compu ter() == true) then
        i_might_use_it_a_bit();
    else
        system("find /music -type f -name \"*trent*reznor*\" | xargs -t cat - | ssh hackeduser@neighborcomputer \"cat - > /dev/audio\"");
    endif

    But that is just me.

"What man has done, man can aspire to do." -- Jerry Pournelle, about space flight

Working...