Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Is Speech Recognition Finally 'Good Enough'? 313

jcatcw writes "Speech recognition software is fast, but it still may not be accurate enough. Clerical jobs usually ask for 40 wpm, but speech recognition software can keep up with someone speaking at 160 wpm. In Lamont Wood's demo it did very well at too/two/to and which/witch, but will it still render 'I really admire your analysis' as "I really admire urinalysis'? At 95% accuracy, people aren't jumping on the bandwagon. Wood's typing speed is about 60 wpm with 93% accuracy, so he found that using speech recognition was about twice as fast as typing. Those who type at hunt-and-peck speeds will experience results that are even more dramatic. There's really only one product on the US market: Dragon NaturallySpeaking from Nuance Communications. The free versions from Microsoft aren't up to the task and IBM sold ViaVoice to Nuance, where it's treated as an entry-level product."
This discussion has been archived. No new comments can be posted.

Is Speech Recognition Finally 'Good Enough'?

Comments Filter:
  • I'm using Dragon NaturallySpeaking. Right now, as I write this calm it, comet, post, and it sure as hacking beats typing.

    Actually, I am using Dragon NaturallySpeaking right now, and it works very well. It actually works better if you speak quickly (as you normally would) and it's pretty good at inserting grammar along the way. I have bilateral tendinitis, and the software has been a godsend for me. I was even able to finish writing my book, a task that was becoming just too painful typing manually.

    Oh, and you are probably wondering how long it takes to train the software? About a half an hour, and I find the accuracy at around 95%.
  • Re:Problems (Score:5, Informative)

    by Sciros ( 986030 ) on Friday May 18, 2007 @05:19PM (#19184887) Journal
    It all depends what sort of corpus the SR system is trained on. So yeah, foreigners will have problems because a system trained for, say, British English will not perform well with American English. For this same reason an SR system trained for "normal" speech will do very poorly with lyrics in music.

    As for stuff like "i really admire your analysis" being interpreted as "i really admire urinalysis," that stuff can easily be ironed out by an n-gram based system that "ranks" English sentences based on probability. What is the chance that "urinalysis" will follow "your" which follows "admire"? Such things can be estimated well enough if you use a large corpus to train your n-gram system (as long as the corpus you're using for this is the same "kind" as whatever speech the SR system is interpreting -- that is, newswire, business meeting, etc.)
  • by Sciros ( 986030 ) on Friday May 18, 2007 @05:22PM (#19184943) Journal
    Yeah, Nuance makes good stuff. Well, they've bought up everyone worth anything afaik, so I guess it's only to be expected.
  • by __aajwxe560 ( 779189 ) on Friday May 18, 2007 @05:39PM (#19185183)
    I am presently a financial customer of an enterprise speech recognition product that Nuance offers. For several years now, the speech recognition software industry has been under consolidation, with Nuance buying a few different competitors and technologies. Most recently, this dance has continued with Nuance being acquired by ScanSoft, a company known for specializing in type recognition.

    Nuance support is marginal at best, and through all the consolidations, understanding even within their own company of how the product works is quite lacking. We have found our own developers often times educating the Nuance support folks in various aspects of how the product is working, and then inquiring as to whether this is intended behavior or not. Crickets can often be heard finishing these types of conversations. We normally would have moved to another product under these conditions, but simply put - Nuance acquired what little was left, and now has no competition in the market. Competition is what spurs innovation, and so with the continued consolidation, it is hard to see significant advances in the technology without free help from academia.

    If you think the Microsoft monopoly is bad, imagine if they absorbed Apple and somehow took over Linux leaving you with a few "choices", but all under the Microsoft moniker. The technology is very neat and the enterprise level products do some basic things quite well, but there is still some glaring room for innovation that I don't expect anytime soon under present industry conditions.
  • Re:Hmmm.... (Score:5, Informative)

    by cnettel ( 836611 ) on Friday May 18, 2007 @06:44PM (#19185911)
    n-gram based language models are nothing new. Statistics is all fun and dandy, but it's no panacea. It might just be enough to throw in an even larger corpus (something like the complete Google index), but it's still hard. (BTW, n-gram Markov chains more or less originated in speech recognition, to get the individual phonemes right, and I'm quite sure they're doing at least something like it at the word level these days. It still sucks, as the quality users demand for proper dictation is extremely high.)
  • by zuzulo ( 136299 ) on Friday May 18, 2007 @06:57PM (#19186057) Homepage
    The Sphinx project is the current 'gold standard' in open source speech recognition. It can be found at

    Sphinx Project at CMU [sourceforge.net]

    I have used a variety of open source libraries in addition to 'rolling my own' and for general purposes Sphinx is certainly the most mature option.
  • Re:Glottal stops (Score:1, Informative)

    by Anonymous Coward on Friday May 18, 2007 @07:10PM (#19186177)
    Please tell me you're not talking about engrams in Dianetics. I don't think they are ... it sounds like they're talking about n-grams. http://en.wikipedia.org/wiki/N-gram [wikipedia.org].
  • by spywhere ( 824072 ) on Friday May 18, 2007 @07:52PM (#19186581)
    We were testing an edition of Dragon Naturally Speaking back in 2000, when an Asian-American woman on our team took the microphone. She had a heavy accent, and the software interpreted her words as... nothing.

    She stood there, trying to get it to write something, and finally ended up repeating, "It not woking! Why it is not woking?"

    We were afraid to laugh, fearing a trip to HR... we all stood there, biting the insides of our cheeks, until she gave up and left the room; then, we collapsed on the floor, literally ROTFL.
  • No... (Score:1, Informative)

    by MLS100 ( 1073958 ) on Friday May 18, 2007 @08:20PM (#19186763)
    >>Is speech recognition 'good enough'?
    No.
    >>I'm sorry I did not understand your selection. Is speech recognition 'good enough'?
    NO.
    >>I'm sorry I did not understand your selection. Is speech recognition 'good enough'?
    NO!
    >>I'm sorry I did not understand your selection. Is speech recognition 'good enough'?
    HOW ABOUT DIE!
    >>Answer "yes" entered, is this correct?
    No.
    >>I'm sorry I did not understand your selection. Is this correct?
    AJKFLSJFKSLFJSDKFDJSKSFDJK
    >>Thank you for participating in our survey, goodbye.
  • Zeno's Translator (Score:4, Informative)

    by Carcass666 ( 539381 ) on Friday May 18, 2007 @08:28PM (#19186801)

    Speech recognition has been at a standstill for years now, it's been "almost there now" for well over five years. As mentioned in other posts, there has been a lot of consolidation and that has really hurt growth. Lernout & Hauspie and Dragon were constantly going back and forth a few years ago trying to get a leg up on each other. When L&H got into all of their accounting problems and shut down, that left Dragon and IBM. IBM's product went to Scansoft and went to Nuance where it languishes until somebody pulls the plug (for example, if you call for support on ViaVoice and mention you have XP SP2, they will tell you it is not a supported platform).

    Most of the improvement in the Dragon and ViaVoice over the last couple of years has been in the reduction of training required to get to the high-ninety's level of accuracy (assuming noise-cancelling mic in a quiet room and you do not have a cold/sore-throat). The advancements in training have not corresponded to much in the way of translation accuracy. A "trained" Dragon 7 recognizes speech pretty much as well as Dragon 9 (I haven't played with Dragon 10 yet).

    Most of the real speech recognition advancement these days is focused on discrete word sets for voice mail trees and other interactive systems. When you are on the phone giving your credit card number, two/to/too is all the same thing. While speech recognition in its current incarnation is good for people who can't type (disabilities, carpal-tunnel, etc.) it is not a replacement for typing, and isn't any closer today than it was five years ago.

  • You're joking but... (Score:4, Informative)

    by thepotoo ( 829391 ) <thepotoospam@yah[ ]com ['oo.' in gap]> on Saturday May 19, 2007 @12:26AM (#19188189)
    You hit the nail on the head with that one. My sister uses Dragon Speak Naturally exclusively (she's dyslexic and can't type or read worth crap, so she has to use Dragon Speak Naturally and Kurzweil (screen reader).

    Dragon requires MONTHS of training (literally), and even then it will make mistakes exactly like the one you noted. The plus side is that Dragon works pretty decently under WINE, but apart from their Linux "support", it's a complete mess.

    Screen readers aren't much better; they have the accuracy, but are hard to understand.
    For a little geeky fun, I had Kurzweil read a few English papers to Dragon. Even after some training, Dragon still couldn't get above 80% accuracy on a computer generated, 100% reliable, voice. Now that's just sad.

One way to make your old car run better is to look up the price of a new model.

Working...